We leveraged electronic health record (EHR) data from the Accelerating Data Value Across a National Community Health Center Network (ADVANCE) Clinical Research Network (CRN) to identify social risk factor clusters, assess their association with obstructive sleep apnea (OSA), and determine relevant clinical predictors of cardiovascular (CV) outcomes among those experiencing OSA. Geographically informed social indicators were used to define social risk factor clusters via latent class analysis. EHR-wide diagnoses were used as predictors of 5-year incidence of major adverse CV events (MACE) using STREAMLINE, an end-to-end rigorous and interpretable automated machine learning pipeline.
View Article and Find Full Text PDFBackground: Epistasis, the interaction between genetic loci where the effect of one locus is influenced by one or more other loci, plays a crucial role in the genetic architecture of complex traits. However, as the number of loci considered increases, the investigation of epistasis becomes exponentially more complex, making the selection of key features vital for effective downstream analyses. Relief-Based Algorithms (RBAs) are often employed for this purpose due to their reputation as "interaction-sensitive" algorithms and uniquely non-exhaustive approach.
View Article and Find Full Text PDFBackground: The investigation of epistasis becomes increasingly complex as more loci are considered due to the exponential expansion of possible interactions. Consequently, selecting key features that influence epistatic interactions is crucial for effective downstream analyses. Recognizing this challenge, this study investigates the efficiency of Relief-Based Algorithms (RBAs) in detecting higher-order epistatic interactions, which may be critical for understanding the genetic architecture of complex traits.
View Article and Find Full Text PDFObjectives: To synthesize discussions among sexual minority men and gender diverse (SMMGD) individuals on mpox, given limited representation of SMMGD voices in existing mpox literature.
Methods: BERTopic (a topic modeling technique) was employed with human validations to analyze mpox-related tweets ( = 8,688; October 2020-September 2022) from 2,326 self-identified SMMGD individuals in the U.S.
Stud Health Technol Inform
January 2024
According to the World Stroke Organization, 12.2 million people world-wide will have their first stroke this year almost half of which will die as a result. Natural Language Processing (NLP) may improve stroke phenotyping; however, existing rule-based classifiers are rigid, resulting in inadequate performance.
View Article and Find Full Text PDFInterrogating plasma cell-free DNA (cfDNA) to detect cancer offers promise; however, no current tests scan structural variants (SVs) throughout the genome. Here, we report a simple molecular workflow to enrich a tumorigenic SV (DNA palindromes/fold-back inversions) that often demarcates genomic amplification and its feasibility for cancer detection by combining low-throughput next-generation sequencing with automated machine learning (Genome-wide Analysis of Palindrome Formation, GAPF-seq). Tumor DNA signal manifested as skewed chromosomal distributions of high-coverage 1-kb bins (HCBs), differentiating 39 matched breast tumor DNA from normal DNA with an average AUC of 0.
View Article and Find Full Text PDFPlasma cell-free DNA (cfDNA) is a promising source of gene mutations for cancer detection by liquid biopsy. However, no current tests interrogate chromosomal structural variants (SVs) genome-wide. Here, we report a simple molecular and sequencing workflow called Genome-wide Analysis of Palindrome Formation (GAPF-seq) to probe DNA palindromes, a type of SV that often demarcates gene amplification.
View Article and Find Full Text PDFSupply-demand mismatch of ward resources ("ward capacity strain") alters care and outcomes. Narrow strain definitions and heterogeneous populations limit strain literature. Evaluate the predictive utility of a large set of candidate strain variables for in-hospital mortality and discharge destination among acute respiratory failure (ARF) survivors.
View Article and Find Full Text PDFSTREAMLINE is a simple, transparent, end-to-end automated machine learning (AutoML) pipeline for easily conducting rigorous machine learning (ML) modeling and analysis. The initial version is limited to binary classification. In this work, we extend STREAMLINE through implementing multiple regression-based ML models, including linear regression, elastic net, group lasso, and L21 norm.
View Article and Find Full Text PDFAmyloid imaging has been widely used in Alzheimer's disease (AD) diagnosis and biomarker discovery through detecting the regional amyloid plaque density. It is essential to be normalized by a reference region to reduce noise and artifacts. To explore an optimal normalization strategy, we employ an automated machine learning (AutoML) pipeline, STREAMLINE, to conduct the AD diagnosis binary classification and perform permutation-based feature importance analysis with thirteen machine learning models.
View Article and Find Full Text PDFBackground: It is currently unknown if disease severity modifies response to therapy in pulmonary arterial hypertension (PAH). We aimed to explore if disease severity, as defined by established risk-prediction algorithms, modified response to therapy in randomised clinical trials in PAH.
Methods: We performed a meta-analysis using individual participant data from 18 randomised clinical trials of therapy for PAH submitted to the United States Food and Drug Administration to determine if predicted risk of 1-year mortality at randomisation modified the treatment effect on three outcomes: change in 6-min walk distance (6MWD), clinical worsening at 12 weeks and time to clinical worsening.
Our objective was to detect common barriers to post-acute care (B2PAC) among hospitalized older adults using natural language processing (NLP) of clinical notes from patients discharged home when a clinical decision support system recommended post-acute care. We annotated B2PAC sentences from discharge planning notes and developed an NLP classifier to identify the highest-value B2PAC class (negative patient preferences). Thirteen machine learning models were compared with Amazon's AutoGluon deep learning model.
View Article and Find Full Text PDFPurpose: Predicting 30-day readmission risk is paramount to improving the quality of patient care. In this study, we compare sets of patient-, provider-, and community-level variables that are available at two different points of a patient's inpatient encounter (first 48 hours and the full encounter) to train readmission prediction models and identify possible targets for appropriate interventions that can potentially reduce avoidable readmissions.
Methods: Using electronic health record data from a retrospective cohort of 2,460 oncology patients and a comprehensive machine learning analysis pipeline, we trained and tested models predicting 30-day readmission on the basis of data available within the first 48 hours of admission and from the entire hospital encounter.
Sex-based differences in pulmonary arterial hypertension (PAH) are known, but the contribution to disease measures is understudied. We examined whether sex was associated with baseline 6-minute-walk distance (6MWD), hemodynamics, and functional class. We conducted a secondary analysis of participant-level data from randomized clinical trials of investigational PAH therapies conducted between 1998 and 2014 and provided by the U.
View Article and Find Full Text PDFGenetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences.
View Article and Find Full Text PDFBackground: Obesity is increasingly prevalent in pulmonary arterial hypertension (PAH) but is associated with improved survival, creating an "obesity paradox" in PAH. It is unknown if the improved outcomes could be attributable to obese patients deriving a greater benefit from PAH therapies.
Research Question: Does BMI modify treatment effectiveness in PAH?
Study Design And Methods: Using individual participant data, a meta-analysis was conducted of phase III, randomized, placebo-controlled trials of treatments for PAH submitted for approval to the U.
Background: Gene set enrichment analysis (GSEA) uses gene-level univariate associations to identify gene set-phenotype associations for hypothesis generation and interpretation. We propose that GSEA can be adapted to incorporate SNP and gene-level interactions. To this end, gene scores are derived by Relief-based feature importance algorithms that efficiently detect both univariate and interaction effects (MultiSURF) or exclusively interaction effects (MultiSURF*).
View Article and Find Full Text PDFThe population of patients with pulmonary arterial hypertension (PAH) has evolved over time from predominantly young White women to an older, more racially diverse and obese population. Whether these changes are reflected in clinical trials is not known. To determine secular and regional trends among PAH trial participants.
View Article and Find Full Text PDFObjective: Data harmonization is essential to integrate individual participant data from multiple sites, time periods, and trials for meta-analysis. The process of mapping terms and phrases to an ontology is complicated by typographic errors, abbreviations, truncation, and plurality. We sought to harmonize medical history (MH) and adverse events (AE) term records across 21 randomized clinical trials in pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
September 2021
Growing demand for biomedical informaticists and expertise in areas related to this discipline has accentuated the need to integrate biomedical informatics training into high school curricula. The K-12 Bioinformatics professional development project educates high school teachers about data analysis, biomedical informatics and mobile learning, and partners with them to expose high school students to health and environment-related issues using biomedical informatics knowledge and current technologies. We designed low-cost pollution sensors and created interactive web applications that teachers from six Philadelphia public high schools used during the 2019-2020 school year to successfully implement a problem-based mobile learning unit that included collecting and interpreting air pollution data, as well as relating this data to asthma.
View Article and Find Full Text PDF