The major aim of the present megastudy of picture-naming norms was to address the shortcomings of the available picture data sets used in psychological and linguistic research by creating a new database of normed colour images that researchers from around the world can rely upon in their investigations. In order to do this, we employed a new form of normative study, namely a megastudy, whereby 1620 colour photographs of items spanning across 42 semantic categories were named and rated by a group of German speakers. This was done to establish the following linguistic norms: speech onset times (SOT), name agreement, accuracy, familiarity, visual complexity, valence, and arousal.
View Article and Find Full Text PDFEmotions play a fundamental role in language learning, use, and processing. Words denoting positivity account for a larger part of the lexicon than words denoting negativity, and they also tend to be used more frequently, a phenomenon known as positivity bias. However, language experience changes over an individual's lifetime, making the examination of the emotion-laden lexicon an important topic not only across the life span but also across languages.
View Article and Find Full Text PDFVocabulary size seems to be affected by multiple factors, including those that belong to the properties of the words themselves and those that relate to the characteristics of the individuals assessing the words. In this study, we present results from a crowdsourced lexical decision megastudy in which more than 150,000 native speakers from around 20 Spanish-speaking countries performed a lexical decision task to 70 target word items selected from a list of about 45,000 Spanish words. We examined how demographic characteristics such as age, education level, and multilingualism affected participants' vocabulary size.
View Article and Find Full Text PDFWe present a new dataset of English word recognition times for a total of 62 thousand words, called the English Crowdsourcing Project. The data were collected via an internet vocabulary test in which more than one million people participated. The present dataset is limited to native English speakers.
View Article and Find Full Text PDFWe present a new database of Dutch word recognition times for a total of 54 thousand words, called the Dutch Crowdsourcing Project. The data were collected with an internet vocabulary test. The database is limited to native Dutch speakers.
View Article and Find Full Text PDFWe monitored the progress of 40 children when they first started to acquire a second language (L2) implicitly through immersion. Employing a longitudinal design, we tested them before they had any notions of an L2 (Time 0) and after 1 school year of L2 exposure (Time 1) to determine whether cognitive abilities can predict the success of L2 learning. Task administration included measures of intelligence, cognitive control, and language skills.
View Article and Find Full Text PDFWe present word prevalence data for 61,858 English words. Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over 220,000 people.
View Article and Find Full Text PDFAccording to a recent study, semantic similarity between concrete entities correlates with the similarity of activity patterns in left middle IPS during category naming. We examined the replicability of this effect under passive viewing conditions, the potential role of visuoperceptual similarity, where the effect is situated compared to regions that have been previously implicated in visuospatial attention, and how it compares to effects of object identity and location. Forty-six subjects participated.
View Article and Find Full Text PDFThe correspondence in meaning extracted from written versus spoken input remains to be fully understood neurobiologically. Here, in a total of 38 subjects, the functional anatomy of cross-modal semantic similarity for concrete words was determined based on a dual criterion: First, a voxelwise univariate analysis had to show significant activation during a semantic task (property verification) performed with written and spoken concrete words compared to the perceptually matched control condition. Second, in an independent dataset, in these clusters, the similarity in fMRI response pattern to two distinct entities, one presented as a written and the other as a spoken word, had to correlate with the similarity in meaning between these entities.
View Article and Find Full Text PDFBased on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. The numbers range from 27,000 lemmas for the lowest 5% to 52,000 for the highest 5%. Between the ages of 20 and 60, the average person learns 6,000 extra lemmas or about one new lemma every 2 days.
View Article and Find Full Text PDFJ Exp Psychol Hum Percept Perform
March 2016
Keuleers, Stevens, Mandera, and Brysbaert (2015) presented a new variable, word prevalence, defined as word knowledge in the population. Some words are known to more people than other. This is particularly true for low-frequency words (e.
View Article and Find Full Text PDFQ J Exp Psychol (Hove)
November 2016
This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics.
View Article and Find Full Text PDFThis paper presents the first systematic examination of the monolingual and bilingual frequency effect (FE) during natural reading. We analyzed single fixation durations on content words for participants reading an entire novel. Unbalanced bilinguals and monolinguals show a similarly sized FE in their mother tongue (L1), but for bilinguals the FE is considerably larger in their second language (L2) than in their L1.
View Article and Find Full Text PDFWe use the results of a large online experiment on word knowledge in Dutch to investigate variables influencing vocabulary size in a large population and to examine the effect of word prevalence-the percentage of a population knowing a word-as a measure of word occurrence. Nearly 300,000 participants were presented with about 70 word stimuli (selected from a list of 53,000 words) in an adapted lexical decision task. We identify age, education, and multilingualism as the most important factors influencing vocabulary size.
View Article and Find Full Text PDFQ J Exp Psychol (Hove)
November 2016
Subjective ratings for age of acquisition, concreteness, affective valence, and many other variables are an important element of psycholinguistic research. However, even for well-studied languages, ratings usually cover just a small part of the vocabulary. A possible solution involves using corpora to build a semantic similarity space and to apply machine learning techniques to extrapolate existing ratings to previously unrated words.
View Article and Find Full Text PDFWe present SUBTLEX-PL, Polish word frequencies based on movie subtitles. In two lexical decision experiments, we compare the new measures with frequency estimates derived from another Polish text corpus that includes predominantly written materials. We show that the frequencies derived from the two corpora perform best in predicting human performance in a lexical decision task if used in a complementary way.
View Article and Find Full Text PDFQ J Exp Psychol (Hove)
April 2015
We present word frequencies based on subtitles of British television programmes. We show that the SUBTLEX-UK word frequencies explain more of the variance in the lexical decision times of the British Lexicon Project than the word frequencies based on the British National Corpus and the SUBTLEX-US frequencies. In addition to the word form frequencies, we also present measures of contextual diversity part-of-speech specific word frequencies, word frequencies in children programmes, and word bigram frequencies, giving researchers of British English access to the full range of norms recently made available for other languages.
View Article and Find Full Text PDFWe assess the amount of shared variance between three measures of visual word recognition latencies: eye movement latencies, lexical decision times, and naming times. After partialling out the effects of word frequency and word length, two well-documented predictors of word recognition latencies, we see that 7-44% of the variance is uniquely shared between lexical decision times and naming times, depending on the frequency range of the words used. A similar analysis of eye movement latencies shows that the percentage of variance they uniquely share either with lexical decision times or with naming times is much lower.
View Article and Find Full Text PDFBehav Res Methods
December 2012
The SUBTLEX-US corpus has been parsed with the CLAWS tagger, so that researchers have information about the possible word classes (parts-of-speech, or PoSs) of the entries. Five new columns have been added to the SUBTLEX-US word frequency list: the dominant (most frequent) PoS for the entry, the frequency of the dominant PoS, the frequency of the dominant PoS relative to the entry's total frequency, all PoSs observed for the entry, and the respective frequencies of these PoSs. Because the current definition of lemma frequency does not seem to provide word recognition researchers with useful information (as illustrated by a comparison of the lemma frequencies and the word form frequencies from the Corpus of Contemporary American English), we have not provided a column with this variable.
View Article and Find Full Text PDFPeople tend to slow down after they make an error. This phenomenon, generally referred to as post-error slowing, has been hypothesized to reflect perceptual distraction, time wasted on irrelevant processes, an a priori bias against the response made in error, increased variability in a priori bias, or an increase in response caution. Although the response caution interpretation has dominated the empirical literature, little research has attempted to test this interpretation in the context of a formal process model.
View Article and Find Full Text PDFWe report performance measures for lexical decision (LD), word naming (NMG), and progressive demasking (PDM) for a large sample of monosyllabic monomorphemic French words (N = 1,482). We compare the tasks and also examine the impact of word length, word frequency, initial phoneme, orthographic and phonological distance to neighbors, age-of-acquisition, and subjective frequency. Our results show that objective word frequency is by far the most important variable to predict reaction times in LD.
View Article and Find Full Text PDFWe present a new database of lexical decision times for English words and nonwords, for which two groups of British participants each responded to 14,365 monosyllabic and disyllabic words and the same number of nonwords for a total duration of 16 h (divided over multiple sessions). This database, called the British Lexicon Project (BLP), fills an important gap between the Dutch Lexicon Project (DLP; Keuleers, Diependaele, & Brysbaert, Frontiers in Language Sciences. Psychology, 1, 174, 2010) and the English Lexicon Project (ELP; Balota et al.
View Article and Find Full Text PDFIn this Perspective Article we assess the usefulness of Google's new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project (Balota et al., 2007) than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles.
View Article and Find Full Text PDF