Challenges and Prospects in Vision and Language Research.

Front Artif Intell

Center for Imaging Science, Rochester Institute of Technology, Rochester, NY, United States.

Published: December 2019

Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, the datasets and evaluation procedures used in these tasks are replete with flaws which allows the vision and language (V&L) algorithms to achieve a good performance without a robust understanding of vision and language. We argue for this position based on several recent studies in V&L literature and our own observations of dataset bias, robustness, and spurious correlations. Finally, we propose that several of these challenges can be mitigated by creation of carefully designed benchmarks.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861287PMC
http://dx.doi.org/10.3389/frai.2019.00028DOI Listing

Publication Analysis

Top Keywords

vision language
12
language
5
challenges prospects
4
vision
4
prospects vision
4
language language
4
language grounded
4
grounded image
4
image understanding
4
understanding tasks
4

Similar Publications

Recently, the field of multimodal large language models (MLLMs) has grown rapidly, with many Large Vision-Language Models (LVLMs) relying on sequential visual representations. In these models, images are broken down into numerous tokens before being fed into the Large Language Model (LLM) alongside text prompts. However, the opaque nature of these models poses significant challenges to their interpretability, particularly when dealing with complex reasoning tasks.

View Article and Find Full Text PDF

Background And Objectives: The likelihood and severity of neurodevelopmental impairment (NDI) affects critical health care decisions. NDI definitions were developed without parental perspectives. We investigated the agreement between parental vs medical classification of NDI among children born preterm.

View Article and Find Full Text PDF

Background: Primary progressive aphasia (PPA) is a language-based dementia linked with underlying Alzheimer's disease (AD) or frontotemporal dementia. Clinicians often report difficulty differentiating between the logopenic (lv) and nonfluent/agrammatic (nfv) subtypes, as both variants present with disruptions to "fluency" yet for different underlying reasons. In English, acoustic and linguistic markers from connected speech samples have shown promise in machine learning (ML)-based differentiation of nfv from lv.

View Article and Find Full Text PDF

Biomarkers.

Alzheimers Dement

December 2024

UT Health San Antonio, San Antonio, TX, USA.

Background: Primary progressive aphasia (PPA) is a language-led dementia associated with underlying Alzheimer's disease (AD) or frontotemporal lobar degeneration pathology. As part of the Alzheimer's spectrum, logopenic (lv) PPA may be particularly difficult to distinguish from amnestic AD, due to overlapping clinical features. Analysis of linguistic and acoustic variables derived from connected speech has shown promise as a diagnostic tool for differentiating dementia subtypes.

View Article and Find Full Text PDF

Technology and Dementia Preconference.

Alzheimers Dement

December 2024

UT Health San Antonio, San Antonio, TX, USA.

Background: Primary progressive aphasia (PPA) is a language-led dementia associated with underlying Alzheimer's disease (AD) or frontotemporal lobar degeneration pathology. As part of the Alzheimer's spectrum, logopenic (lv) PPA may be particularly difficult to distinguish from amnestic AD, due to overlapping clinical features. Analysis of linguistic and acoustic variables derived from connected speech has shown promise as a diagnostic tool for differentiating dementia subtypes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!