Subtlex-pl: subtitle-based word frequency estimates for Polish.

Behav Res Methods

Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, 9000, Gent, Belgium,

Published: June 2015

We present SUBTLEX-PL, Polish word frequencies based on movie subtitles. In two lexical decision experiments, we compare the new measures with frequency estimates derived from another Polish text corpus that includes predominantly written materials. We show that the frequencies derived from the two corpora perform best in predicting human performance in a lexical decision task if used in a complementary way. Our results suggest that the two corpora may have unequal potential for explaining human performance for words in different frequency ranges and that corpora based on written materials severely overestimate frequencies for formal words. We discuss some of the implications of these findings for future studies comparing different frequency estimates. In addition to frequencies for word forms, SUBTLEX-PL includes measures of contextual diversity, part-of-speech-specific word frequencies, frequencies of associated lemmas, and word bigrams, providing researchers with necessary tools for conducting psycholinguistic research in Polish. The database is freely available for research purposes and may be downloaded from the authors' university Web site at http://crr.ugent.be/subtlex-pl .

Download full-text PDF

Source
http://dx.doi.org/10.3758/s13428-014-0489-4DOI Listing

Publication Analysis

Top Keywords

frequency estimates
12
word frequencies
8
lexical decision
8
written materials
8
human performance
8
frequencies
6
word
5
subtlex-pl subtitle-based
4
subtitle-based word
4
frequency
4

Similar Publications

Background: Practice guidelines recommend patient management based on scientific evidence. Quality indicators gauge adherence to such recommendations and assess health care quality. They are usually defined as adverse event rates, which may not fully capture guideline adherence over time.

View Article and Find Full Text PDF

Texture analysis generates image parameters from F-18 fluorodeoxyglucose positron emission tomography/computed tomography (FDG PET/CT). Although some parameters correlate with tumor biology and clinical attributes, their types and implications can be complex. To overcome this limitation, pseudotime analysis was applied to texture parameters to estimate changes in individual sample characteristics, and the prognostic significance of the estimated pseudotime of primary tumors was evaluated.

View Article and Find Full Text PDF

Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge.

Behav Res Methods

December 2024

ETSI de Telecomunicación, Universidad Politécnica de Madrid, Avenida Complutense, 30, 28040, Madrid, Spain.

This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures.

View Article and Find Full Text PDF

Despite being largely spoken and studied by language and cognitive scientists, Italian lacks large resources of language processing data. The Italian Crowdsourcing Project (ICP) is a dataset of word recognition times and accuracy including responses to 130,465 words, which makes it the largest dataset of its kind item-wise. The data were collected in an online word knowledge task in which over 156,000 native speakers of Italian took part.

View Article and Find Full Text PDF

Tuberculosis (TB) is the leading cause of death from a single infectious agent. The burden is highest in some low- and middle-income countries. One-quarter of the world's population is estimated to have been infected with TB, which is the seedbed for progressing from TB infection to the deadly and contagious disease itself.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!