Publications by authors named "Mathew Koretsky"

Backgrounds: Biomedical research requires sophisticated understanding and reasoning across multiple specializations. While large language models (LLMs) show promise in scientific applications, their capability to safely and accurately support complex biomedical research remains uncertain.

Methods: We present , a novel question-and-answer benchmark for evaluating LLMs in biomedical research.

View Article and Find Full Text PDF

Elucidating the genetic contributions to Parkinson's disease (PD) etiology across diverse ancestries is a critical priority for the development of targeted therapies in a global context. We conducted the largest sequencing characterization of potentially disease-causing, protein-altering and splicing mutations in 710 cases and 11,827 controls from genetically predicted African or African admixed ancestries. We explored copy number variants (CNVs) and runs of homozygosity (ROHs) in prioritized early onset and familial cases.

View Article and Find Full Text PDF

Background: Known pathogenic variants in Parkinson's disease (PD) contribute to disease development but have yet to be fully explored by arrays at scale.

Objectives: This study evaluated genotyping success of the NeuroBooster array (NBA) and determined the frequencies of pathogenic variants across ancestries.

Method: We analyzed the presence and allele frequency of 34 pathogenic variants in 28,710 PD cases, 9,614 other neurodegenerative disorder cases, and 15,821 controls across 11 ancestries within the Global Parkinson's Genetics Program dataset.

View Article and Find Full Text PDF
Article Synopsis
  • - The study highlights the complexity of Alzheimer's disease and related dementias, emphasizing the need to understand genetic and environmental factors that vary across different ancestries for personalized treatment approaches.
  • - Utilizing large-scale biobank data, the research characterized genetic variants associated with Alzheimer's across 11 ancestries, identifying 116 potentially linked variants, including 18 known pathogenic ones and 98 new variants.
  • - The findings revealed significant ancestry-driven differences in disease risk, including a higher presence of ε4/ε4 carriers in African ancestries, suggesting the importance of considering genetics in diverse populations to enhance understanding and treatment of AD/ADRDs.
View Article and Find Full Text PDF
Article Synopsis
  • Copy Number Variations (CNVs) are crucial in understanding complex diseases and vary across different populations, necessitating large sample studies for accurate analysis.
  • The CNV-Finder pipeline utilizes deep learning, specifically Long Short-Term Memory (LSTM) networks, to streamline the identification of CNVs in specific genomic areas, making subsequent analyses like genome sequencing more efficient.
  • The tool has been validated with data from various cohorts, focusing on genes related to neurological diseases, and includes an interactive web application for researchers to visualize and refine their findings based on model predictions.
View Article and Find Full Text PDF
Article Synopsis
  • Latin America's genetic diversity offers a unique opportunity to study Alzheimer's disease (AD) and frontotemporal dementia (FTD), with a focus on identifying related genetic variations.
  • The study involved 2,162 participants from six countries who underwent extensive genomic sequencing and analysis to detect genetic factors linked to these dementias.
  • Results highlighted a mix of American, African, and European ancestries, discovered 17 pathogenic variants, and revealed specific genetic variations tied to AD and FTD inheritance patterns in affected families.
View Article and Find Full Text PDF
Article Synopsis
  • GenoTools is a Python package designed to simplify population genetics research by integrating key functions like ancestry estimation, quality control, and genome-wide association studies into streamlined pipelines.
  • It allows users to track samples and variants across customizable processes, making it easier to handle genetics data for studies of any size.
  • The tool is utilized in major initiatives like the NIH's Alzheimer's program and has successfully processed vast datasets, contributing to new discoveries and ensuring reliable ancestry predictions and robust quality control in genetic studies.
View Article and Find Full Text PDF
Article Synopsis
  • - The paper explores using Large Language Models (LLMs) to streamline data wrangling and automate tasks in data discovery and harmonization, crucial for making biomedical data AI-ready by developing Common Data Elements (CDEs).
  • - A human-in-the-loop approach was utilized to ensure the accuracy of generated CDEs from various studies and databases, achieving a high accuracy rate where 94.0% of fields required no manual changes, with an interoperability mapping rate of 32.4%.
  • - The resulting CDEs are designed to improve dataset compatibility by measuring how well different data sources align with these standards, ultimately enhancing the efficiency and scalability of biomedical research efforts.
View Article and Find Full Text PDF

Background: Commercial genome-wide genotyping arrays have historically neglected coverage of genetic variation across populations.

Objective: We aimed to create a multi-ancestry genome-wide array that would include a wide range of neuro-specific genetic content to facilitate genetic research in neurological disorders across multiple ancestral groups, fostering diversity and inclusivity in research studies.

Methods: We developed the Illumina NeuroBooster Array (NBA), a custom high-throughput and cost-effective platform on a backbone of 1,914,934 variants from the Infinium Global Diversity Array and added custom content comprising 95,273 variants associated with more than 70 neurological conditions or traits, and we further tested its performance on more than 2000 patient samples.

View Article and Find Full Text PDF

Genotyping single nucleotide polymorphisms (SNPs) is fundamental to disease research, as researchers seek to establish links between genetic variation and disease. Although significant advances in genome technology have been made with the development of bead-based SNP genotyping and Genome Studio software, some SNPs still fail to be genotyped, resulting in "no-calls" that impede downstream analyses. To recover these genotypes, we introduce Cluster Buster, a genotyping neural network and visual inspection system designed to improve the quality of neurodegenerative disease (NDD) research.

View Article and Find Full Text PDF

GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control (QC), and genome-wide association studies (GWAS) capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' "Ancestry" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization, specified to the user's genotyping or sequencing platform.

View Article and Find Full Text PDF
Article Synopsis
  • Recent FDA approvals for Alzheimer treatments like lecanemab and aducanumab emphasize the need for understanding biological mechanisms to develop effective therapies for neurodegenerative disorders.
  • The study utilizes genetic data to identify 116 genes associated with Alzheimer and other neurodegenerative diseases, classifying them based on their potential for drug development.
  • A new web platform is introduced to help researchers easily explore therapeutic targets, encouraging further investigation and collaboration in tackling these challenging diseases.
View Article and Find Full Text PDF

Genome-wide genotyping platforms have the capacity to capture genetic variation across different populations, but there have been disparities in the representation of population-dependent genetic diversity. The motivation for pursuing this endeavor was to create a comprehensive genome-wide array capable of encompassing a wide range of neuro-specific content for the Global Parkinson's Genetics Program (GP2) and the Center for Alzheimer's and Related Dementias (CARD). CARD aims to increase diversity in genetic studies, using this array as a tool to foster inclusivity.

View Article and Find Full Text PDF
Article Synopsis
  • The study analyzes over one million electronic health records to explore the link between sleep disorders and the development of neurodegenerative diseases (NDDs) like Alzheimer's and Parkinson's later in life.
  • Findings indicate that severe sleep disorders, such as sleep apnea, significantly increase the risk of various NDDs, potentially up to 15 years before symptoms appear.
  • The research suggests that addressing modifiable sleep-related risk factors may help mitigate neurodegeneration risk, particularly in individuals with lower genetic predispositions to these diseases.*
View Article and Find Full Text PDF

Background: An understanding of the genetic mechanisms underlying diseases in ancestrally diverse populations is an important step towards development of targeted treatments. Research in African and African admixed populations can enable mapping of complex traits, because of their genetic diversity, extensive population substructure, and distinct linkage disequilibrium patterns. We aimed to do a comprehensive genome-wide assessment in African and African admixed individuals to better understand the genetic architecture of Parkinson's disease in these underserved populations.

View Article and Find Full Text PDF

High-dimensional data analysis starts with projecting the data to low dimensions to visualize and understand the underlying data structure. Several methods have been developed for dimensionality reduction, but they are limited to cross-sectional datasets. The recently proposed Aligned-UMAP, an extension of the uniform manifold approximation and projection (UMAP) algorithm, can visualize high-dimensional longitudinal datasets.

View Article and Find Full Text PDF
Article Synopsis
  • The study focuses on understanding genetic factors contributing to Parkinson's disease (PD) within African and African admixed populations to advance precision medicine.
  • A genome-wide assessment involving nearly 200,000 individuals identified a significant risk factor linked to the gene at locus rs3115534-G, with a strong correlation to PD onset and a mechanism related to gene expression rather than coding mutations.
  • The findings suggest this genetic variant is uniquely prevalent among African ancestries, highlighting the importance of diverse populations in researching complex diseases like PD.
View Article and Find Full Text PDF
Article Synopsis
  • Neurodegenerative diseases (NDDs) often share similar symptoms and genetic risk factors, indicating a possible interconnectedness among them.
  • This study clusters patients with five major NDDs using genetic data, revealing significant overlaps in genetic causes and supporting the idea of neurodegeneration as a spectrum.
  • The findings suggest that some patients lack common genetic risk factors, hinting at other influences like environmental factors, and emphasize the need for further research to understand how these variants affect disease development and treatment.
View Article and Find Full Text PDF
Article Synopsis
  • Recent FDA approvals like Lecanemab and Aducanumab for Alzheimer's Disease stress the need for better treatments for neurodegenerative disorders, especially as the global population ages.
  • This study presents a comprehensive framework to identify therapeutic targets using genetic data and provides insights into the mechanisms of disease, identifying numerous target genes for various conditions including Alzheimer's and Parkinson's disease.
  • A user-friendly web platform is also created to allow researchers and the community to easily explore these therapeutic targets, facilitating future drug discovery and development.
View Article and Find Full Text PDF

Background: Biallelic pathogenic variants in GBA1 are the cause of Gaucher disease (GD) type 1 (GD1), a lysosomal storage disorder resulting from deficient glucocerebrosidase. Heterozygous GBA1 variants are also a common genetic risk factor for Parkinson's disease (PD). GD manifests with considerable clinical heterogeneity and is also associated with an increased risk for PD.

View Article and Find Full Text PDF
Article Synopsis
  • Accurate prediction of surgical risks, particularly postoperative reintubation (POR), is crucial for shared decision-making and informed consent among patients and healthcare providers.
  • The study utilized machine learning models on data from the American College of Surgeons to create scoring systems for predicting both early and late instances of POR, identifying key risk factors from an initial set of 37 pre- and perioperative variables.
  • The scoring system derived from logistic regression showed strong accuracy and effectiveness in predicting outcomes, demonstrating that even with fewer risk variables, the models still performed comparably to those using the full dataset, highlighting their potential utility in clinical practice.
View Article and Find Full Text PDF