Automating the extraction of behavioral criteria indicative of Autism Spectrum Disorder (ASD) in electronic health records (EHRs) can contribute significantly to the effort to monitor the condition. Word embedding algorithms such as Word2Vec can encode semantic meanings of words in vectors and assist in automated vocabulary discovery from EHRs. However, text available for training word embeddings for ASD is miniscule compared to the billions of tokens typically used. We evaluate the importance of corpus specificity versus size and hypothesize that for specific domains small corpora can generate excellent word embeddings. We custom-built 6 ASD-themed corpora (N=4482), using ASD EHRs and abstracts from PubMed (N=39K) and PsychInfo (N=69K) and evaluated them. We were able to generate the most useful 200-dimension embeddings based on the small ASD EHR data. Due to diversity in its vocabulary, the abstract-based embeddings generated fewer related terms and saw minimal improvement when the size of the corpus increased.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6371367PMC

Publication Analysis

Top Keywords

training word
8
word embedding
8
autism spectrum
8
spectrum disorder
8
disorder asd
8
word embeddings
8
asd
5
optimizing corpus
4
corpus creation
4
creation training
4

Similar Publications

Background And Objectives: Medical student clinical clerkship evaluations provide feedback for growth and contribute to the clerkship grade and the student's residency application. Their importance is expected to increase even more with the recent change of the US Medical Licensing Examination Step 1 to a pass/fail designation. Timely completion of medical student clerkship evaluations is a problem.

View Article and Find Full Text PDF

Bibliometric Analysis of Anxiety and Physical Education in Web of Science-A Performance and Co-Word Study.

Pediatr Rep

December 2024

Department of Didactics and School Organization, Faculty of Education, Economics and Technology of Ceuta, University of Granada, 51001 Ceuta, Spain.

This study conducts a comprehensive bibliometric analysis of the concepts 'physical edu- cation' and 'anxiety' (PHYEDU_ANX) in the Web of Science (WoS) database. No previous biblio- metric studies were found that addressed this intersection, so this research is a pioneering exploration of this knowledge gap. The aim of the study is to examine the presence of both concepts in the scientific literature, identifying their trends, approaches, and future prospects.

View Article and Find Full Text PDF

Sequence order resolves ambiguity in a nonlinguistic visual categorization task.

Atten Percept Psychophys

December 2024

Department of Psychology, Emory University, 36 Eagle Row, Atlanta, GA, 30322, USA.

When we encounter an unfamiliar word in a sentence, word order can be used to determine the grammatical category to which that word belongs and clarify ambiguity. However, it is unclear whether a similar categorization effect occurs in nonlinguistic contexts. We created three perceptually distinct categories of shape stimuli-rounded (A); squared (B); pointed (C).

View Article and Find Full Text PDF

Objective: To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques.

Materials And Methods: We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.

View Article and Find Full Text PDF

Cesium accumulation and plant growth promotion characteristics of A10 isolated from L. rhizosphere soil.

Int J Phytoremediation

December 2024

Key Laboratory of the Evaluation and Monitoring of Southwest Land Resources (Ministry of Education), Sichuan Normal University, Chengdu, China.

The combined microbial-plant remediation has increasingly been used to remediate heavy metal-contaminated soil. Some microorganisms could enhance phytoremediation efficiency by solubilizing heavy metal and improve plant growth by producing phytohormones in the heavy metal contaminated soils. In the present study, a strong cesium (Cs)-tolerant fungal strain was identified from soil microorganisms contaminated with Cs, and the enrichment conditions for Cs were optimized.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!