Publications by authors named "Faghri F"

Backgrounds: Biomedical research requires sophisticated understanding and reasoning across multiple specializations. While large language models (LLMs) show promise in scientific applications, their capability to safely and accurately support complex biomedical research remains uncertain.

Methods: We present , a novel question-and-answer benchmark for evaluating LLMs in biomedical research.

View Article and Find Full Text PDF

TDP-43 mislocalization and pathology occurs across a range of neurodegenerative diseases, but the pathways that modulate TDP-43 in neurons are not well understood. We generated a Halo-TDP-43 knock-in iPSC line and performed a genome-wide CRISPR interference FACS-based screen to identify modifiers of TDP-43 levels in neurons. A meta-analysis of our screen and publicly available screens identified both specific hits and pathways present across multiple screens, the latter likely responsible for generic protein level maintenance.

View Article and Find Full Text PDF
Article Synopsis
  • - The study highlights the complexity of Alzheimer's disease and related dementias, emphasizing the need to understand genetic and environmental factors that vary across different ancestries for personalized treatment approaches.
  • - Utilizing large-scale biobank data, the research characterized genetic variants associated with Alzheimer's across 11 ancestries, identifying 116 potentially linked variants, including 18 known pathogenic ones and 98 new variants.
  • - The findings revealed significant ancestry-driven differences in disease risk, including a higher presence of ε4/ε4 carriers in African ancestries, suggesting the importance of considering genetics in diverse populations to enhance understanding and treatment of AD/ADRDs.
View Article and Find Full Text PDF

Background: Alzheimer's disease and related dementias (ADRD) and Parkinson's disease (PD) are the most common neurodegenerative conditions. These central nervous system disorders impact both the structure and function of the brain and may lead to imaging changes that precede symptoms. Patients with ADRD or PD have long asymptomatic phases that exhibit significant heterogeneity.

View Article and Find Full Text PDF
Article Synopsis
  • GenoTools is a Python package designed to simplify population genetics research by integrating key functions like ancestry estimation, quality control, and genome-wide association studies into streamlined pipelines.
  • It allows users to track samples and variants across customizable processes, making it easier to handle genetics data for studies of any size.
  • The tool is utilized in major initiatives like the NIH's Alzheimer's program and has successfully processed vast datasets, contributing to new discoveries and ensuring reliable ancestry predictions and robust quality control in genetic studies.
View Article and Find Full Text PDF
Article Synopsis
  • - The paper explores using Large Language Models (LLMs) to streamline data wrangling and automate tasks in data discovery and harmonization, crucial for making biomedical data AI-ready by developing Common Data Elements (CDEs).
  • - A human-in-the-loop approach was utilized to ensure the accuracy of generated CDEs from various studies and databases, achieving a high accuracy rate where 94.0% of fields required no manual changes, with an interoperability mapping rate of 32.4%.
  • - The resulting CDEs are designed to improve dataset compatibility by measuring how well different data sources align with these standards, ultimately enhancing the efficiency and scalability of biomedical research efforts.
View Article and Find Full Text PDF

Background: Commercial genome-wide genotyping arrays have historically neglected coverage of genetic variation across populations.

Objective: We aimed to create a multi-ancestry genome-wide array that would include a wide range of neuro-specific genetic content to facilitate genetic research in neurological disorders across multiple ancestral groups, fostering diversity and inclusivity in research studies.

Methods: We developed the Illumina NeuroBooster Array (NBA), a custom high-throughput and cost-effective platform on a backbone of 1,914,934 variants from the Infinium Global Diversity Array and added custom content comprising 95,273 variants associated with more than 70 neurological conditions or traits, and we further tested its performance on more than 2000 patient samples.

View Article and Find Full Text PDF

GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control (QC), and genome-wide association studies (GWAS) capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' "Ancestry" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization, specified to the user's genotyping or sequencing platform.

View Article and Find Full Text PDF

While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's disease prediction.

View Article and Find Full Text PDF
Article Synopsis
  • Recent FDA approvals for Alzheimer treatments like lecanemab and aducanumab emphasize the need for understanding biological mechanisms to develop effective therapies for neurodegenerative disorders.
  • The study utilizes genetic data to identify 116 genes associated with Alzheimer and other neurodegenerative diseases, classifying them based on their potential for drug development.
  • A new web platform is introduced to help researchers easily explore therapeutic targets, encouraging further investigation and collaboration in tackling these challenging diseases.
View Article and Find Full Text PDF

Genome-wide genotyping platforms have the capacity to capture genetic variation across different populations, but there have been disparities in the representation of population-dependent genetic diversity. The motivation for pursuing this endeavor was to create a comprehensive genome-wide array capable of encompassing a wide range of neuro-specific content for the Global Parkinson's Genetics Program (GP2) and the Center for Alzheimer's and Related Dementias (CARD). CARD aims to increase diversity in genetic studies, using this array as a tool to foster inclusivity.

View Article and Find Full Text PDF

While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated Learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's Disease prediction.

View Article and Find Full Text PDF
Article Synopsis
  • The study analyzes over one million electronic health records to explore the link between sleep disorders and the development of neurodegenerative diseases (NDDs) like Alzheimer's and Parkinson's later in life.
  • Findings indicate that severe sleep disorders, such as sleep apnea, significantly increase the risk of various NDDs, potentially up to 15 years before symptoms appear.
  • The research suggests that addressing modifiable sleep-related risk factors may help mitigate neurodegeneration risk, particularly in individuals with lower genetic predispositions to these diseases.*
View Article and Find Full Text PDF

Here, we present a standardized, "off-the-shelf" proteomics pipeline working in a single 96-well plate to achieve deep coverage of cellular proteomes with high throughput and scalability. This integrated pipeline streamlines a fully automated sample preparation platform, a data-independent acquisition (DIA) coupled with high-field asymmetric waveform ion mobility spectrometer (FAIMS) interface, and an optimized library-free DIA database search strategy. Our systematic evaluation of FAIMS-DIA showing single compensation voltage (CV) at -35 V not only yields the deepest proteome coverage but also best correlates with DIA without FAIMS.

View Article and Find Full Text PDF

Despite the key roles of perilipin-2 (PLIN2) in governing lipid droplet (LD) metabolism, the mechanisms that regulate PLIN2 levels remain incompletely understood. Here, we leverage a set of genome-edited human PLIN2 reporter cell lines in a series of CRISPR-Cas9 loss-of-function screens, identifying genetic modifiers that influence PLIN2 expression and post-translational stability under different metabolic conditions and in different cell types. These regulators include canonical genes that control lipid metabolism as well as genes involved in ubiquitination, transcription, and mitochondrial function.

View Article and Find Full Text PDF

High-dimensional data analysis starts with projecting the data to low dimensions to visualize and understand the underlying data structure. Several methods have been developed for dimensionality reduction, but they are limited to cross-sectional datasets. The recently proposed Aligned-UMAP, an extension of the uniform manifold approximation and projection (UMAP) algorithm, can visualize high-dimensional longitudinal datasets.

View Article and Find Full Text PDF
Article Synopsis
  • Neurodegenerative diseases (NDDs) often share similar symptoms and genetic risk factors, indicating a possible interconnectedness among them.
  • This study clusters patients with five major NDDs using genetic data, revealing significant overlaps in genetic causes and supporting the idea of neurodegeneration as a spectrum.
  • The findings suggest that some patients lack common genetic risk factors, hinting at other influences like environmental factors, and emphasize the need for further research to understand how these variants affect disease development and treatment.
View Article and Find Full Text PDF
Article Synopsis
  • Recent FDA approvals like Lecanemab and Aducanumab for Alzheimer's Disease stress the need for better treatments for neurodegenerative disorders, especially as the global population ages.
  • This study presents a comprehensive framework to identify therapeutic targets using genetic data and provides insights into the mechanisms of disease, identifying numerous target genes for various conditions including Alzheimer's and Parkinson's disease.
  • A user-friendly web platform is also created to allow researchers and the community to easily explore these therapeutic targets, facilitating future drug discovery and development.
View Article and Find Full Text PDF

With recent findings connecting the Epstein-Barr virus to an increased risk of multiple sclerosis and growing concerns regarding the neurological impact of the coronavirus pandemic, we examined potential links between viral exposures and neurodegenerative disease risk. Using time series data from FinnGen for discovery and cross-sectional data from the UK Biobank for replication, we identified 45 viral exposures significantly associated with increased risk of neurodegenerative disease and replicated 22 of these associations. The largest effect association was between viral encephalitis exposure and Alzheimer's disease.

View Article and Find Full Text PDF

Background: Gene set enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) plays an important role in bioinformatics focused on diseases of genetic basis. To facilitate phenotype-oriented gene set analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases.

Results: PhenoExam generates sensitive and accurate phenotype enrichment analyses.

View Article and Find Full Text PDF

The clinical manifestations of Parkinson's disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. There is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the disease course. We used unsupervised and supervised machine learning methods on comprehensive, longitudinal clinical data from the Parkinson's Disease Progression Marker Initiative (n = 294 cases) to identify patient subtypes and to predict disease progression.

View Article and Find Full Text PDF

Human induced pluripotent stem cell (iPSC) lines are a powerful tool for studying development and disease, but the considerable phenotypic variation between lines makes it challenging to replicate key findings and integrate data across research groups. To address this issue, we sub-cloned candidate human iPSC lines and deeply characterized their genetic properties using whole genome sequencing, their genomic stability upon CRISPR-Cas9-based gene editing, and their phenotypic properties including differentiation to commonly used cell types. These studies identified KOLF2.

View Article and Find Full Text PDF

Microglia are emerging as key drivers of neurological diseases. However, we lack a systematic understanding of the underlying mechanisms. Here, we present a screening platform to systematically elucidate functional consequences of genetic perturbations in human induced pluripotent stem cell-derived microglia.

View Article and Find Full Text PDF

Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson's disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort.

View Article and Find Full Text PDF

Background: Amyotrophic lateral sclerosis (ALS) is known to represent a collection of overlapping syndromes. Various classification systems based on empirical observations have been proposed, but it is unclear to what extent they reflect ALS population substructures. We aimed to use machine-learning techniques to identify the number and nature of ALS subtypes to obtain a better understanding of this heterogeneity, enhance our understanding of the disease, and improve clinical care.

View Article and Find Full Text PDF