We present the Chinese Lexical Database (CLD): a large-scale lexical database for simplified Chinese. The CLD provides a wealth of lexical information for 3913 one-character words, 34,233 two-character words, 7143 three-character words, and 3355 four-character words, and is publicly available through http://www.chineselexicaldatabase.com . For each of the 48,644 words in the CLD, we provide a wide range of categorical predictors, as well as an extensive set of frequency measures, complexity measures, neighborhood density measures, orthography-phonology consistency measures, and information-theoretic measures. We evaluate the explanatory power of the lexical variables in the CLD in the context of experimental data through analyses of lexical decision latencies for one-character, two-character, three-character and four-character words, as well as word naming latencies for one-character and two-character words. The results of these analyses are discussed.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.3758/s13428-018-1038-3 | DOI Listing |
J Exp Psychol Hum Percept Perform
January 2025
Faculty of Science & Technology, Department of Psychology, Bournemouth University.
Computational models of eye movement control during reading have revolutionized the study of visual, perceptual, and linguistic processes underlying reading. However, these models can only simulate and test predictions about the reading of single lines of text. Here we report two studies that examined how input variables for lexical processing (frequency and predictability) in these models influence the processing of line-final words.
View Article and Find Full Text PDFBehav Res Methods
December 2024
Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China.
Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task.
View Article and Find Full Text PDFJ Exp Psychol Learn Mem Cogn
December 2024
Department of Experimental Psychology, Division of Psychology and Language Sciences, University College London.
Fluent reading comprehension demands the rapid access and integration of word meanings. This can be challenging when lexically ambiguous words have less frequent meanings (e.g.
View Article and Find Full Text PDFBehav Res Methods
December 2024
McMaster University, MELD Office 4045, L.R. Wilson Hall, Hamilton, ON, L8N 1E9, Canada.
PeerJ Comput Sci
November 2024
Department of Informatics, Constantine the Philosopher University in Nitra, Nitra, Slovak Republic.
This study introduces a new approach to text tokenization, SlovaK Morphological Tokenizer (SKMT), which integrates the morphology of the Slovak language into the training process using the Byte-Pair Encoding (BPE) algorithm. Unlike conventional tokenizers, SKMT focuses on preserving the integrity of word roots in individual tokens, crucial for maintaining lexical meaning. The methodology involves segmenting and extracting word roots from morphological dictionaries and databases, followed by preprocessing and training SKMT alongside a traditional BPE tokenizer.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!