RaceBERT -- 一种基于 Transformer 的模型，用于从名称预测种族,arXiv - CS - Computation and Language

link之家

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

酷酷的领带 · 音乐的政治经济学分析_手机新浪网· 3 月前 ·

笑点低的肉夹馍 · 学习版不用启动器安装汉化方法【钢铁雄心4吧】 ...· 6 月前 ·

跑龙套的橡皮擦 · openwrt ...· 7 月前 ·

聪明伶俐的刺猬 · 给国内新用户的 Google Pixel ...· 9 月前 ·

大鼻子的椅子 · 川普最值钱的两幢楼，和一条鱼的陈年情仇· 1 年前 ·

本文介绍了 RaceBERT——一种基于转换器的模型，用于从名称中的字符序列预测种族，以及随附的 Python 包。使用在美国佛罗里达州选民登记数据集上训练的基于转换器的模型，该模型预测姓名属于 5 个美国人口普查种族类别（白人、黑人、西班牙裔、亚洲和太平洋岛民、美洲印第安人和阿拉斯加原住民）的可能性。我在 Sood 和Laohaprapanon (2018) 的基础上，将他们的 LSTM 模型替换为基于转换器的模型（预训练的 BERT 模型和从头开始训练的 roBERTa 模型），并比较结果。据我所知，raceBERT 在使用名称进行种族预测方面取得了最先进的结果，平均 f1 得分为 0.86——比之前的最先进技术提高了 4.\1% , 并对非白人名称进行了 15-17\% 的改进。 This paper presents raceBERT -- a transformer-based model for predicting race from character sequences in names, and an accompanying python package. Using a transformer-based model trained on a U.S. Florida voter registration dataset, the model predicts the likelihood of a name belonging to 5 U.S. census race categories (White, Black, Hispanic, Asian & Pacific Islander, American Indian & Alaskan Native). I build on Sood and Laohaprapanon (2018) by replacing their LSTM model with transformer-based models (pre-trained BERT model, and a roBERTa model trained from scratch), and compare the results. To the best of my knowledge, raceBERT achieves state-of-the-art results in race prediction using names, with an average f1-score of 0.86 -- a 4.\1% improvement over the previous state-of-the-art, and improvements between 15-17\% for non-white names.