Backgrounds: Biomedical research requires sophisticated understanding and reasoning across multiple specializations. While large language models (LLMs) show promise in scientific applications, their capability to safely and accurately support complex biomedical research remains uncertain.
Methods: We present , a novel question-and-answer benchmark for evaluating LLMs in biomedical research.
TDP-43 mislocalization and pathology occurs across a range of neurodegenerative diseases, but the pathways that modulate TDP-43 in neurons are not well understood. We generated a Halo-TDP-43 knock-in iPSC line and performed a genome-wide CRISPR interference FACS-based screen to identify modifiers of TDP-43 levels in neurons. A meta-analysis of our screen and publicly available screens identified both specific hits and pathways present across multiple screens, the latter likely responsible for generic protein level maintenance.
View Article and Find Full Text PDFBackground: Alzheimer's disease and related dementias (ADRD) and Parkinson's disease (PD) are the most common neurodegenerative conditions. These central nervous system disorders impact both the structure and function of the brain and may lead to imaging changes that precede symptoms. Patients with ADRD or PD have long asymptomatic phases that exhibit significant heterogeneity.
View Article and Find Full Text PDFBackground: Commercial genome-wide genotyping arrays have historically neglected coverage of genetic variation across populations.
Objective: We aimed to create a multi-ancestry genome-wide array that would include a wide range of neuro-specific genetic content to facilitate genetic research in neurological disorders across multiple ancestral groups, fostering diversity and inclusivity in research studies.
Methods: We developed the Illumina NeuroBooster Array (NBA), a custom high-throughput and cost-effective platform on a backbone of 1,914,934 variants from the Infinium Global Diversity Array and added custom content comprising 95,273 variants associated with more than 70 neurological conditions or traits, and we further tested its performance on more than 2000 patient samples.
GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control (QC), and genome-wide association studies (GWAS) capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' "Ancestry" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization, specified to the user's genotyping or sequencing platform.
View Article and Find Full Text PDFWhile machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's disease prediction.
View Article and Find Full Text PDFGenome-wide genotyping platforms have the capacity to capture genetic variation across different populations, but there have been disparities in the representation of population-dependent genetic diversity. The motivation for pursuing this endeavor was to create a comprehensive genome-wide array capable of encompassing a wide range of neuro-specific content for the Global Parkinson's Genetics Program (GP2) and the Center for Alzheimer's and Related Dementias (CARD). CARD aims to increase diversity in genetic studies, using this array as a tool to foster inclusivity.
View Article and Find Full Text PDFWhile machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated Learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's Disease prediction.
View Article and Find Full Text PDFCell Rep Methods
October 2023
Here, we present a standardized, "off-the-shelf" proteomics pipeline working in a single 96-well plate to achieve deep coverage of cellular proteomes with high throughput and scalability. This integrated pipeline streamlines a fully automated sample preparation platform, a data-independent acquisition (DIA) coupled with high-field asymmetric waveform ion mobility spectrometer (FAIMS) interface, and an optimized library-free DIA database search strategy. Our systematic evaluation of FAIMS-DIA showing single compensation voltage (CV) at -35 V not only yields the deepest proteome coverage but also best correlates with DIA without FAIMS.
View Article and Find Full Text PDFDespite the key roles of perilipin-2 (PLIN2) in governing lipid droplet (LD) metabolism, the mechanisms that regulate PLIN2 levels remain incompletely understood. Here, we leverage a set of genome-edited human PLIN2 reporter cell lines in a series of CRISPR-Cas9 loss-of-function screens, identifying genetic modifiers that influence PLIN2 expression and post-translational stability under different metabolic conditions and in different cell types. These regulators include canonical genes that control lipid metabolism as well as genes involved in ubiquitination, transcription, and mitochondrial function.
View Article and Find Full Text PDFHigh-dimensional data analysis starts with projecting the data to low dimensions to visualize and understand the underlying data structure. Several methods have been developed for dimensionality reduction, but they are limited to cross-sectional datasets. The recently proposed Aligned-UMAP, an extension of the uniform manifold approximation and projection (UMAP) algorithm, can visualize high-dimensional longitudinal datasets.
View Article and Find Full Text PDFWith recent findings connecting the Epstein-Barr virus to an increased risk of multiple sclerosis and growing concerns regarding the neurological impact of the coronavirus pandemic, we examined potential links between viral exposures and neurodegenerative disease risk. Using time series data from FinnGen for discovery and cross-sectional data from the UK Biobank for replication, we identified 45 viral exposures significantly associated with increased risk of neurodegenerative disease and replicated 22 of these associations. The largest effect association was between viral encephalitis exposure and Alzheimer's disease.
View Article and Find Full Text PDFBackground: Gene set enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) plays an important role in bioinformatics focused on diseases of genetic basis. To facilitate phenotype-oriented gene set analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases.
Results: PhenoExam generates sensitive and accurate phenotype enrichment analyses.
The clinical manifestations of Parkinson's disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. There is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the disease course. We used unsupervised and supervised machine learning methods on comprehensive, longitudinal clinical data from the Parkinson's Disease Progression Marker Initiative (n = 294 cases) to identify patient subtypes and to predict disease progression.
View Article and Find Full Text PDFHuman induced pluripotent stem cell (iPSC) lines are a powerful tool for studying development and disease, but the considerable phenotypic variation between lines makes it challenging to replicate key findings and integrate data across research groups. To address this issue, we sub-cloned candidate human iPSC lines and deeply characterized their genetic properties using whole genome sequencing, their genomic stability upon CRISPR-Cas9-based gene editing, and their phenotypic properties including differentiation to commonly used cell types. These studies identified KOLF2.
View Article and Find Full Text PDFMicroglia are emerging as key drivers of neurological diseases. However, we lack a systematic understanding of the underlying mechanisms. Here, we present a screening platform to systematically elucidate functional consequences of genetic perturbations in human induced pluripotent stem cell-derived microglia.
View Article and Find Full Text PDFPersonalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson's disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort.
View Article and Find Full Text PDFBackground: Amyotrophic lateral sclerosis (ALS) is known to represent a collection of overlapping syndromes. Various classification systems based on empirical observations have been proposed, but it is unclear to what extent they reflect ALS population substructures. We aimed to use machine-learning techniques to identify the number and nature of ALS subtypes to obtain a better understanding of this heterogeneity, enhance our understanding of the disease, and improve clinical care.
View Article and Find Full Text PDF