Publications by Faghri F | LitMetric

Publications by authors named "Faghri F"

Page 1 of 2

CARDBiomedBench: A Benchmark for Evaluating Large Language Model Performance in Biomedical Research.

Owen Bianchi Maya Willey Chelsea X Alvarado Benjamin Danek Marzieh Khani

bioRxiv

January 2025

Backgrounds: Biomedical research requires sophisticated understanding and reasoning across multiple specializations. While large language models (LLMs) show promise in scientific applications, their capability to safely and accurately support complex biomedical research remains uncertain.

Methods: We present , a novel question-and-answer benchmark for evaluating LLMs in biomedical research.

View Article and Find Full Text PDF

Maintenance of neuronal TDP-43 expression requires axonal lysosome transport.

Veronica H Ryan Sydney Lawton Joel F Reyes James Hawrot Ashley M Frankenfield

bioRxiv

October 2024

TDP-43 mislocalization and pathology occurs across a range of neurodegenerative diseases, but the pathways that modulate TDP-43 in neurons are not well understood. We generated a Halo-TDP-43 knock-in iPSC line and performed a genome-wide CRISPR interference FACS-based screen to identify modifiers of TDP-43 levels in neurons. A meta-analysis of our screen and publicly available screens identified both specific hits and pathways present across multiple screens, the latter likely responsible for generic protein level maintenance.

View Article and Find Full Text PDF

Biobank-scale characterization of Alzheimer's disease and related dementias identifies potential disease-causing variants, risk factors, and genetic modifiers across diverse ancestries.

Marzieh Khani Fulya Akçimen Spencer M Grant S Can Akerman Paul Suhwan Lee

medRxiv

November 2024

Article Synopsis

- The study highlights the complexity of Alzheimer's disease and related dementias, emphasizing the need to understand genetic and environmental factors that vary across different ancestries for personalized treatment approaches.
- Utilizing large-scale biobank data, the research characterized genetic variants associated with Alzheimer's across 11 ancestries, identifying 116 potentially linked variants, including 18 known pathogenic ones and 98 new variants.
- The findings revealed significant ancestry-driven differences in disease risk, including a higher presence of ε4/ε4 carriers in African ancestries, suggesting the importance of considering genetics in diverse populations to enhance understanding and treatment of AD/ADRDs.

View Article and Find Full Text PDF

Prediction, prognosis and monitoring of neurodegeneration at biobank-scale via machine learning and imaging.

Anant Dadu Michael Ta Nicholas J Tustison Ali Daneshmand Ken Marek

medRxiv

October 2024

Background: Alzheimer's disease and related dementias (ADRD) and Parkinson's disease (PD) are the most common neurodegenerative conditions. These central nervous system disorders impact both the structure and function of the brain and may lead to imaging changes that precede symptoms. Patients with ADRD or PD have long asymptomatic phases that exhibit significant heterogeneity.

View Article and Find Full Text PDF

GenoTools: an open-source Python package for efficient genotype data quality control and analysis.

Dan Vitale Mathew J Koretsky Nicole Kuznetsov Samantha Hong Jessica Martin

G3 (Bethesda)

January 2025

Article Synopsis

GenoTools is a Python package designed to simplify population genetics research by integrating key functions like ancestry estimation, quality control, and genome-wide association studies into streamlined pipelines.
It allows users to track samples and variants across customizable processes, making it easier to handle genetics data for studies of any size.
The tool is utilized in major initiatives like the NIH's Alzheimer's program and has successfully processed vast datasets, contributing to new discoveries and ensuring reliable ancestry predictions and robust quality control in genetic studies.

View Article and Find Full Text PDF

A new AI-assisted data standard accelerates interoperability in biomedical research.

Rodney Alan Long Shannon Ballard Syed Shah Owen Bianchi Lietsel Jones

medRxiv

November 2024

Article Synopsis

- The paper explores using Large Language Models (LLMs) to streamline data wrangling and automate tasks in data discovery and harmonization, crucial for making biomedical data AI-ready by developing Common Data Elements (CDEs).
- A human-in-the-loop approach was utilized to ensure the accuracy of generated CDEs from various studies and databases, achieving a high accuracy rate where 94.0% of fields required no manual changes, with an interoperability mapping rate of 32.4%.
- The resulting CDEs are designed to improve dataset compatibility by measuring how well different data sources align with these standards, ultimately enhancing the efficiency and scalability of biomedical research efforts.

View Article and Find Full Text PDF

NeuroBooster Array: A Genome-Wide Genotyping Platform to Study Neurological Disorders Across Diverse Populations.

Sara Bandres-Ciga Faraz Faghri Elisa Majounie Mathew J Koretsky Jeffrey Kim

Mov Disord

November 2024

Background: Commercial genome-wide genotyping arrays have historically neglected coverage of genetic variation across populations.

Objective: We aimed to create a multi-ancestry genome-wide array that would include a wide range of neuro-specific genetic content to facilitate genetic research in neurological disorders across multiple ancestral groups, fostering diversity and inclusivity in research studies.

Methods: We developed the Illumina NeuroBooster Array (NBA), a custom high-throughput and cost-effective platform on a backbone of 1,914,934 variants from the Infinium Global Diversity Array and added custom content comprising 95,273 variants associated with more than 70 neurological conditions or traits, and we further tested its performance on more than 2000 patient samples.

View Article and Find Full Text PDF

GenoTools: An Open-Source Python Package for Efficient Genotype Data Quality Control and Analysis.

Dan Vitale Mathew Koretsky Nicole Kuznetsov Samantha Hong Jessica Martin

bioRxiv

July 2024

GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control (QC), and genome-wide association studies (GWAS) capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' "Ancestry" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization, specified to the user's genotyping or sequencing platform.

View Article and Find Full Text PDF

Federated learning for multi-omics: A performance evaluation in Parkinson's disease.

Benjamin P Danek Mary B Makarious Anant Dadu Dan Vitale Paul Suhwan Lee

Patterns (N Y)

March 2024

While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's disease prediction.

View Article and Find Full Text PDF

omicSynth: An open multi-omic community resource for identifying druggable targets across neurodegenerative diseases.

Chelsea X Alvarado Mary B Makarious Cory A Weller Dan Vitale Mathew J Koretsky

Am J Hum Genet

January 2024

Article Synopsis

Recent FDA approvals for Alzheimer treatments like lecanemab and aducanumab emphasize the need for understanding biological mechanisms to develop effective therapies for neurodegenerative disorders.
The study utilizes genetic data to identify 116 genes associated with Alzheimer and other neurodegenerative diseases, classifying them based on their potential for drug development.
A new web platform is introduced to help researchers easily explore therapeutic targets, encouraging further investigation and collaboration in tackling these challenging diseases.

View Article and Find Full Text PDF

NeuroBooster Array: A Genome-Wide Genotyping Platform to Study Neurological Disorders Across Diverse Populations.

Sara Bandres-Ciga Faraz Faghri Elisa Majounie Mathew J Koretsky Jeffrey Kim

medRxiv

November 2023

Genome-wide genotyping platforms have the capacity to capture genetic variation across different populations, but there have been disparities in the representation of population-dependent genetic diversity. The motivation for pursuing this endeavor was to create a comprehensive genome-wide array capable of encompassing a wide range of neuro-specific content for the Global Parkinson's Genetics Program (GP2) and the Center for Alzheimer's and Related Dementias (CARD). CARD aims to increase diversity in genetic studies, using this array as a tool to foster inclusivity.

View Article and Find Full Text PDF

Federated Learning for multi-omics: a performance evaluation in Parkinson's disease.

Benjamin Danek Mary B Makarious Anant Dadu Dan Vitale Paul Suhwan Lee

bioRxiv

February 2024

While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated Learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's Disease prediction.

View Article and Find Full Text PDF

Sleep disturbances as risk factors for neurodegeneration later in life.

Emily Simmonds Kristin S Levine Jun Han Hirotaka Iwaki Mathew J Koretsky

medRxiv

November 2023

Article Synopsis

The study analyzes over one million electronic health records to explore the link between sleep disorders and the development of neurodegenerative diseases (NDDs) like Alzheimer's and Parkinson's later in life.
Findings indicate that severe sleep disorders, such as sleep apnea, significantly increase the risk of various NDDs, potentially up to 15 years before symptoms appear.
The research suggests that addressing modifiable sleep-related risk factors may help mitigate neurodegeneration risk, particularly in individuals with lower genetic predispositions to these diseases.*

View Article and Find Full Text PDF

A fully automated FAIMS-DIA mass spectrometry-based proteomic pipeline.

Luke Reilly Erika Lara Daniel Ramos Ziyi Li Caroline B Pantazis

Cell Rep Methods

October 2023

Here, we present a standardized, "off-the-shelf" proteomics pipeline working in a single 96-well plate to achieve deep coverage of cellular proteomes with high throughput and scalability. This integrated pipeline streamlines a fully automated sample preparation platform, a data-independent acquisition (DIA) coupled with high-field asymmetric waveform ion mobility spectrometer (FAIMS) interface, and an optimized library-free DIA database search strategy. Our systematic evaluation of FAIMS-DIA showing single compensation voltage (CV) at -35 V not only yields the deepest proteome coverage but also best correlates with DIA without FAIMS.

View Article and Find Full Text PDF

Parallel CRISPR-Cas9 screens identify mechanisms of PLIN2 and lipid droplet regulation.

Melissa A Roberts Kirandeep K Deol Alyssa J Mathiowetz Mike Lange Dara E Leto

Dev Cell

September 2023

Despite the key roles of perilipin-2 (PLIN2) in governing lipid droplet (LD) metabolism, the mechanisms that regulate PLIN2 levels remain incompletely understood. Here, we leverage a set of genome-edited human PLIN2 reporter cell lines in a series of CRISPR-Cas9 loss-of-function screens, identifying genetic modifiers that influence PLIN2 expression and post-translational stability under different metabolic conditions and in different cell types. These regulators include canonical genes that control lipid metabolism as well as genes involved in ubiquitination, transcription, and mitochondrial function.

View Article and Find Full Text PDF

Application of Aligned-UMAP to longitudinal biomedical studies.

Anant Dadu Vipul K Satone Rachneet Kaur Mathew J Koretsky Hirotaka Iwaki

Patterns (N Y)

June 2023

High-dimensional data analysis starts with projecting the data to low dimensions to visualize and understand the underlying data structure. Several methods have been developed for dimensionality reduction, but they are limited to cross-sectional datasets. The recently proposed Aligned-UMAP, an extension of the uniform manifold approximation and projection (UMAP) algorithm, can visualize high-dimensional longitudinal datasets.

View Article and Find Full Text PDF

Genetic risk factor clustering within and across neurodegenerative diseases.

Mathew J Koretsky Chelsea Alvarado Mary B Makarious Dan Vitale Kristin Levine

Brain

November 2023

Article Synopsis

Neurodegenerative diseases (NDDs) often share similar symptoms and genetic risk factors, indicating a possible interconnectedness among them.
This study clusters patients with five major NDDs using genetic data, revealing significant overlaps in genetic causes and supporting the idea of neurodegeneration as a spectrum.
The findings suggest that some patients lack common genetic risk factors, hinting at other influences like environmental factors, and emphasize the need for further research to understand how these variants affect disease development and treatment.

View Article and Find Full Text PDF

omicSynth: an Open Multi-omic Community Resource for Identifying Druggable Targets across Neurodegenerative Diseases.

Chelsea X Alvarado Mary B Makarious Cory A Weller Dan Vitale Mathew J Koretsky

medRxiv

July 2023

Article Synopsis

Recent FDA approvals like Lecanemab and Aducanumab for Alzheimer's Disease stress the need for better treatments for neurodegenerative disorders, especially as the global population ages.
This study presents a comprehensive framework to identify therapeutic targets using genetic data and provides insights into the mechanisms of disease, identifying numerous target genes for various conditions including Alzheimer's and Parkinson's disease.
A user-friendly web platform is also created to allow researchers and the community to easily explore these therapeutic targets, facilitating future drug discovery and development.

View Article and Find Full Text PDF

Virus exposure and neurodegenerative disease risk across national biobanks.

Kristin S Levine Hampton L Leonard Cornelis Blauwendraat Hirotaka Iwaki Nicholas Johnson

Neuron

April 2023

With recent findings connecting the Epstein-Barr virus to an increased risk of multiple sclerosis and growing concerns regarding the neurological impact of the coronavirus pandemic, we examined potential links between viral exposures and neurodegenerative disease risk. Using time series data from FinnGen for discovery and cross-sectional data from the UK Biobank for replication, we identified 45 viral exposures significantly associated with increased risk of neurodegenerative disease and replicated 22 of these associations. The largest effect association was between viral encephalitis exposure and Alzheimer's disease.

View Article and Find Full Text PDF

PhenoExam: gene set analyses through integration of different phenotype databases.

Alejandro Cisterna Aurora González-Vidal Daniel Ruiz Jordi Ortiz Alicia Gómez-Pascual

BMC Bioinformatics

December 2022

Background: Gene set enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) plays an important role in bioinformatics focused on diseases of genetic basis. To facilitate phenotype-oriented gene set analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases.

Results: PhenoExam generates sensitive and accurate phenotype enrichment analyses.

View Article and Find Full Text PDF

Identification and prediction of Parkinson's disease subtypes and progression using machine learning in two cohorts.

Anant Dadu Vipul Satone Rachneet Kaur Sayed Hadi Hashemi Hampton Leonard

NPJ Parkinsons Dis

December 2022

The clinical manifestations of Parkinson's disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. There is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the disease course. We used unsupervised and supervised machine learning methods on comprehensive, longitudinal clinical data from the Parkinson's Disease Progression Marker Initiative (n = 294 cases) to identify patient subtypes and to predict disease progression.

View Article and Find Full Text PDF

A reference human induced pluripotent stem cell line for large-scale collaborative studies.

Caroline B Pantazis Andrian Yang Erika Lara Justin A McDonough Cornelis Blauwendraat

Cell Stem Cell

December 2022

Human induced pluripotent stem cell (iPSC) lines are a powerful tool for studying development and disease, but the considerable phenotypic variation between lines makes it challenging to replicate key findings and integrate data across research groups. To address this issue, we sub-cloned candidate human iPSC lines and deeply characterized their genetic properties using whole genome sequencing, their genomic stability upon CRISPR-Cas9-based gene editing, and their phenotypic properties including differentiation to commonly used cell types. These studies identified KOLF2.

View Article and Find Full Text PDF

A CRISPRi/a platform in human iPSC-derived microglia uncovers regulators of disease states.

Nina M Dräger Sydney M Sattler Cindy Tzu-Ling Huang Olivia M Teter Kun Leng

Nat Neurosci

September 2022

Microglia are emerging as key drivers of neurological diseases. However, we lack a systematic understanding of the underlying mechanisms. Here, we present a screening platform to systematically elucidate functional consequences of genetic perturbations in human induced pluripotent stem cell-derived microglia.

View Article and Find Full Text PDF

Multi-modality machine learning predicting Parkinson's disease.

Mary B Makarious Hampton L Leonard Dan Vitale Hirotaka Iwaki Lana Sargent

NPJ Parkinsons Dis

April 2022

Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson's disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort.

View Article and Find Full Text PDF

Identifying and predicting amyotrophic lateral sclerosis clinical subgroups: a population-based machine-learning study.

Faraz Faghri Fabian Brunn Anant Dadu

Lancet Digit Health

May 2022

Background: Amyotrophic lateral sclerosis (ALS) is known to represent a collection of overlapping syndromes. Various classification systems based on empirical observations have been proposed, but it is unclear to what extent they reflect ALS population substructures. We aimed to use machine-learning techniques to identify the number and nature of ALS subtypes to obtain a better understanding of this heterogeneity, enhance our understanding of the disease, and improve clinical care.

View Article and Find Full Text PDF