Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, the datasets and evaluation procedures used in these tasks are replete with flaws which allows the vision and language (V&L) algorithms to achieve a good performance without a robust understanding of vision and language. We argue for this position based on several recent studies in V&L literature and our own observations of dataset bias, robustness, and spurious correlations. Finally, we propose that several of these challenges can be mitigated by creation of carefully designed benchmarks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861287 | PMC |
http://dx.doi.org/10.3389/frai.2019.00028 | DOI Listing |
Neural Netw
December 2024
Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, Shanghai, China. Electronic address:
Recently, the field of multimodal large language models (MLLMs) has grown rapidly, with many Large Vision-Language Models (LVLMs) relying on sequential visual representations. In these models, images are broken down into numerous tokens before being fed into the Large Language Model (LLM) alongside text prompts. However, the opaque nature of these models poses significant challenges to their interpretability, particularly when dealing with complex reasoning tasks.
View Article and Find Full Text PDFPediatrics
January 2025
Department of Pediatrics, University of British Columbia, Vancouver, British Columbia, Canada.
Background And Objectives: The likelihood and severity of neurodevelopmental impairment (NDI) affects critical health care decisions. NDI definitions were developed without parental perspectives. We investigated the agreement between parental vs medical classification of NDI among children born preterm.
View Article and Find Full Text PDFBackground: Primary progressive aphasia (PPA) is a language-based dementia linked with underlying Alzheimer's disease (AD) or frontotemporal dementia. Clinicians often report difficulty differentiating between the logopenic (lv) and nonfluent/agrammatic (nfv) subtypes, as both variants present with disruptions to "fluency" yet for different underlying reasons. In English, acoustic and linguistic markers from connected speech samples have shown promise in machine learning (ML)-based differentiation of nfv from lv.
View Article and Find Full Text PDFBackground: Primary progressive aphasia (PPA) is a language-led dementia associated with underlying Alzheimer's disease (AD) or frontotemporal lobar degeneration pathology. As part of the Alzheimer's spectrum, logopenic (lv) PPA may be particularly difficult to distinguish from amnestic AD, due to overlapping clinical features. Analysis of linguistic and acoustic variables derived from connected speech has shown promise as a diagnostic tool for differentiating dementia subtypes.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
UT Health San Antonio, San Antonio, TX, USA.
Background: Primary progressive aphasia (PPA) is a language-led dementia associated with underlying Alzheimer's disease (AD) or frontotemporal lobar degeneration pathology. As part of the Alzheimer's spectrum, logopenic (lv) PPA may be particularly difficult to distinguish from amnestic AD, due to overlapping clinical features. Analysis of linguistic and acoustic variables derived from connected speech has shown promise as a diagnostic tool for differentiating dementia subtypes.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!