SUBTLEX-CAT: Subtitle word frequencies and contextual diversity for Catalan.

Behav Res Methods

Department of Psychology and Research Center for Behavior Assessment, Universitat Rovira i Virgili, Tarragona, Spain.

Published: February 2020

SUBTLEX-CAT is a word frequency and contextual diversity database for Catalan, obtained from a 278-million-word corpus based on subtitles supplied from broadcast Catalan television. Like all previous SUBTLEX corpora, it comprises subtitles from films and TV series. In addition, it includes a wider range of TV shows (e.g., news, documentaries, debates, and talk shows) than has been included in most previous databases. Frequency metrics were obtained for the whole corpus, on the one hand, and only for films and fiction TV series, on the other. Two lexical decision experiments revealed that the subtitle-based metrics outperformed the previously available frequency estimates, computed from either written texts or texts from the Internet. Furthermore, the metrics obtained from the whole corpus were better predictors than the ones obtained from films and fiction TV series alone. In both experiments, the best predictor of response times and accuracy was contextual diversity.

Download full-text PDF

Source
http://dx.doi.org/10.3758/s13428-019-01233-1DOI Listing

Publication Analysis

Top Keywords

contextual diversity
12
metrics corpus
8
films fiction
8
fiction series
8
subtlex-cat subtitle
4
subtitle word
4
word frequencies
4
frequencies contextual
4
diversity catalan
4
catalan subtlex-cat
4

Similar Publications

A Review of CNN Applications in Smart Agriculture Using Multimodal Data.

Sensors (Basel)

January 2025

Institut de Recherche en Informatique de Toulouse, IRIT UMR5505 CNRS, 31400 Toulouse, France.

This review explores the applications of Convolutional Neural Networks (CNNs) in smart agriculture, highlighting recent advancements across various applications including weed detection, disease detection, crop classification, water management, and yield prediction. Based on a comprehensive analysis of more than 115 recent studies, coupled with a bibliometric study of the broader literature, this paper contextualizes the use of CNNs within Agriculture 5.0, where technological integration optimizes agricultural efficiency.

View Article and Find Full Text PDF

Assessing riparian functioning condition for improved ecosystem services: A case study of the Back Creek watershed (Virginia, USA).

J Environ Manage

January 2025

U.S. Environmental Protection Agency, Office of Research and Development, 960 College Station Rd., Athens, GA, 30605, USA. Electronic address:

Riparian functioning condition refers to a rating and description of the current ecological status of a reach of a riparian ecosystem in consideration of its potential hydrology, vegetation, and geomorphology. Reach rating options are Proper Functioning Condition (PFC), Functional-At-Risk (FAR), Non-Functional, and apparent or monitored trends. We assessed the functioning condition of flowing riverbank areas of Back Creek located in Virginia (USA) following a PFC protocol developed by the U.

View Article and Find Full Text PDF

From experts' perspective, factors affecting the effectiveness of online educational programs in promoting the health literacy of MS patients: A grounded theory approach.

Patient Educ Couns

January 2025

Nano Tech Laboratory, School of Engineering, Faculty of Science and Engineering, Macquarie University, Sydney, Australia. Electronic address:

Background: Online educational programs have emerged as a promising tool for promoting health literacy (HL) among multiple sclerosis (MS) patients. However, identifying influencing factors is crucial for maximizing their effectiveness.

Aim: This study aimed to explain the factors affecting the effectiveness of online educational programs in promoting HL among MS patients in Iran.

View Article and Find Full Text PDF

BioGSF: a graph-driven semantic feature integration framework for biomedical relation extraction.

Brief Bioinform

November 2024

Suzhou Key Lab of Multi-modal Data Fusion and Intelligent Healthcare, No. 1188 Wuzhong Avenue, Wuzhong District Suzhou, Suzhou 215004, China.

The automatic and accurate extraction of diverse biomedical relations from literature constitutes the core elements of medical knowledge graphs, which are indispensable for healthcare artificial intelligence. Currently, fine-tuning through stacking various neural networks on pre-trained language models (PLMs) represents a common framework for end-to-end resolution of the biomedical relation extraction (RE) problem. Nevertheless, sequence-based PLMs, to a certain extent, fail to fully exploit the connections between semantics and the topological features formed by these connections.

View Article and Find Full Text PDF

This study introduces a novel AI-driven approach to support elderly patients in Thailand with medication management, focusing on accurate drug label interpretation. Two model architectures were explored: a Two-Stage Optical Character Recognition (OCR) and Large Language Model (LLM) pipeline combining EasyOCR with Qwen2-72b-instruct and a Uni-Stage Visual Question Answering (VQA) model using Qwen2-72b-VL. Both models operated in a zero-shot capacity, utilizing Retrieval-Augmented Generation (RAG) with DrugBank references to ensure contextual relevance and accuracy.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!