Weakly supervised temporal language grounding (TLG) aims to locate events in untrimmed videos based on natural language queries without temporal annotations, necessitating a deep understanding of semantic context across both video and text modalities. Existing methods often focus on simple correlations between query phrases and isolated video segments, neglecting the event-oriented semantic coherence and consistency required for accurate temporal grounding. This can lead to misleading results due to partial frame correlations. To address these limitations, we propose the Event-oriented State Alignment Network (ESAN), which constructs "start-event-end" semantic state sets for both textual and video data. ESAN employs relative entropy for cross-modal alignment through knowledge distillation from pre-trained large models, thereby enhancing semantic coherence within each modality and ensuring consistency across modalities. Our approach leverages vision-language models to extract static frame semantics and large language models to capture dynamic semantic changes, facilitating a more comprehensive understanding of events. Experiments conducted on two benchmark datasets demonstrate that ESAN significantly outperforms existing methods. By reducing false high correlations and improving the overall performance, our method effectively addresses the challenges posed by previous approaches. These advancements highlight the potential of ESAN to improve the precision and reliability of temporal language grounding tasks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11431080 | PMC |
http://dx.doi.org/10.3390/e26090730 | DOI Listing |
J Neurodev Disord
January 2025
Graduate Neuroscience Program, University of California, Riverside, CA, USA.
Background: Fragile X syndrome (FXS) is a leading known genetic cause of intellectual disability and autism spectrum disorders (ASD)-associated behaviors. A consistent and debilitating phenotype of FXS is auditory hypersensitivity that may lead to delayed language and high anxiety. Consistent with findings in FXS human studies, the mouse model of FXS, the Fmr1 knock out (KO) mouse, shows auditory hypersensitivity and temporal processing deficits.
View Article and Find Full Text PDFJ Vis
January 2025
Department of Communicative Disorders, University of Alabama, Tuscaloosa, AL, USA.
The visual environment of sign language users is markedly distinct in its spatiotemporal parameters compared to that of non-signers. Although the importance of temporal and spectral resolution in the auditory modality for language development is well established, the spectrotemporal parameters of visual attention necessary for sign language comprehension remain less understood. This study investigates visual temporal resolution in learners of American Sign Language (ASL) at various stages of acquisition to determine how experience with sign language affects perceptual sampling.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
AviadoBio, London, London, United Kingdom.
Background: Frontotemporal dementia (FTD) presents with a change in personality, behaviour and language and is the second most common cause of young-onset dementia after Alzheimer's disease. Loss of function mutations in GRN, encoding progranulin (PGRN), causes FTD in the heterozygous state, accounting for 5-10% of all FTD cases. PGRN is essential for normal lysosomal function and neuronal survival.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Trinity Biomedical Sciences Institute, Trinity College Dublin, University of Dublin, Dublin, Dublin 2, Ireland.
Background: Amyotrophic lateral sclerosis (ALS) shares pathological and genetic underpinnings with frontotemporal dementia (FTD). ALS manifests with diverse symptoms, including progressive neuro-motor degeneration, muscle weakness, but also cognitive-behavioural changes in up to half of the cases. Resting-state EEG measures, particularly spectral power and functional connectivity, have been instrumental for discerning abnormal motor and cognitive network function in ALS [1]-[3].
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Sanatorio de la Trinidad Mitre, Buenos Aires, Argentina.
Background: Dural arteriovenous fistulas (DAVFs) are abnormal communications between dural arteries and cortical, meningeal, or dural sinus veins. They represent 10-15% of intracranial arteriovenous malformations. In rare cases, they have been associated with potentially reversible cognitive impairment and dementia.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!