Negative Associations in Word Embeddings Predict Anti-black Bias across Regions-but Only via Name Frequency.

Austin van Loon Salvatore Giorgi Robb Willer Johannes Eichstaedt

Proc Int AAAI Conf Weblogs Soc Media

Published: May 2022

The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-Black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus-even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spuriously high anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147343	PMC
http://dx.doi.org/10.1609/icwsm.v16i1.19399	DOI Listing

Publication Analysis

Top Keywords

word embeddings

anti-black weat

weat estimates

social media

frequency black

black names

negative associations

word

associations word

embeddings predict

Similar Publications

The Role of Morphological Information in Processing Pseudo-words in Italian L2 Learners: It's a Matter of Experience.

J Cogn

January 2025

Department of Humanities, University of Trento, via Tommaso Gar 14, 38122, Trento, Italy.

Simona Amenta Francesca Foppolo Linda Badan

The productive use of morphological information is considered one of the possible ways in which speakers of a language understand and learn unknown words. In the present study we investigate if, and how, also adult L2 learners exploit morphological information to process unknown words by analyzing the impact of language proficiency in the processing of novel derivations. Italian L2 learners, divided into three proficiency groups, participated in a lexical decision where pseudo-words could embed existing stems (e.

View Article and Find Full Text PDF

Similar Publications

Uncertainty estimation in diagnosis generation from large language models: next-word probability is not pre-test probability.

JAMIA Open

February 2025

Department of Medicine, University of Wisconsin-Madison, Madison, WI 53792, United States.

Yanjun Gao Skatje Myers Shan Chen Dmitriy Dligach Timothy Miller

Objective: To evaluate large language models (LLMs) for pre-test diagnostic probability estimation and compare their uncertainty estimation performance with a traditional machine learning classifier.

Materials And Methods: We assessed 2 instruction-tuned LLMs, Mistral-7B-Instruct and Llama3-70B-chat-hf, on predicting binary outcomes for Sepsis, Arrhythmia, and Congestive Heart Failure (CHF) using electronic health record (EHR) data from 660 patients. Three uncertainty estimation methods-Verbalized Confidence, Token Logits, and LLM Embedding+XGB-were compared against an eXtreme Gradient Boosting (XGB) classifier trained on raw EHR data.

View Article and Find Full Text PDF

Similar Publications

Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique.

Comput Biol Med

January 2025

Thai Nguyen University of Information and Communication Technology, Thai Nguyen City, Viet Nam. Electronic address:

Thi-Xuan Tran Nguyen Quoc Khanh Le Van-Nui Nguyen

Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging.

View Article and Find Full Text PDF

Similar Publications

Obfuscated Malware Detection and Classification in Network Traffic Leveraging Hybrid Large Language Models and Synthetic Data.

Sensors (Basel)

January 2025

Department of Computer Science, Al-Baha University, Al-Baha 65779, Saudi Arabia.

Mehwish Naseer Farhan Ullah Samia Ijaz Hamad Naeem Amjad Alsirhani

Android malware detection remains a critical issue for mobile security. Cybercriminals target Android since it is the most popular smartphone operating system (OS). Malware detection, analysis, and classification have become diverse research areas.

View Article and Find Full Text PDF

Similar Publications

Continuous theta-burst stimulation demonstrates language-network-specific causal effects on syntactic processing.

Neuroimage

January 2025

Max Planck Partner Group, School of International Chinese Language Education, Beijing Normal University, Beijing, China; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany. Electronic address:

Chenyang Gao Junjie Wu Yao Cheng Yuming Ke Xingfang Qu

Hierarchical syntactic structure processing is proposed to be at the core of the human language faculty. Syntactic processing is supported by the left fronto-temporal language network, including a core area in the inferior frontal gyrus as well as its interaction with the posterior temporal lobe (i.e.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!