Publications by authors named "Xihong Lin"

Linear mixed-effects models (LMMs) and ridge regression are commonly applied in genetic association studies to control for population structure and sample-relatedness. To control for sample-relatedness, the existing methods use empirical genetic relatedness matrices (GRM) either explicitly or conceptually. This works well with mostly homogeneous populations, however, in multi-ancestry heterogeneous populations, GRMs are confounded with population structure which leads to inflated type I error rates, massively increased computation, and reduced power.

View Article and Find Full Text PDF
Article Synopsis
  • - The text discusses the serious global health issues of lung cancer and tobacco use and introduces the GREAT care paradigm, which uses polygenic risk scores (PRSs) to enhance cancer prevention and encourage healthier behaviors in patients.
  • - Researchers developed standardized PRSs using extensive genetic data from diverse populations and validated them in a large sample, revealing significant risk factors for lung cancer and challenges in quitting smoking across different groups.
  • - The PRS-based intervention aims to integrate genetic risk assessments into primary care, with plans for evaluation through clinical trials, potentially leading to better prevention strategies for lung cancer and more effective tobacco treatments.
View Article and Find Full Text PDF

Polygenic risk scores are widely used in disease risk stratification, but their accuracy varies across diverse populations. Recent methods large-scale leverage multi-ancestry data to improve accuracy in under-represented populations but require labelling individuals by ancestry for prediction. This poses challenges for practical use, as clinical practices are typically not based on ancestry.

View Article and Find Full Text PDF

Motivation: Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.

Results: We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR.

View Article and Find Full Text PDF
Article Synopsis
  • The study investigates how rare non-coding genetic variations affect complex traits, specifically focusing on human height by analyzing data from over 333,100 individuals across three large datasets.
  • Researchers found 29 significant rare variants linked to height, with impacts ranging from a decrease of 7 cm to an increase of 4.7 cm, after considering previously known variants.
  • The team also identified specific non-coding variants near key genes associated with height, demonstrating a new method for understanding the effects of rare variants in regulatory regions using whole-genome sequencing.
View Article and Find Full Text PDF

Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous.

View Article and Find Full Text PDF

Associations of biological aging with the development and mortality of cardiometabolic multimorbidity (CMM) remain unclear. Here we conducted a multistate analysis in 341,159 adults of the UK Biobank. CMM was defined as the coexistence of two or three cardiometabolic diseases (CMDs), including type 2 diabetes, ischemic heart disease and stroke.

View Article and Find Full Text PDF

Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary-level data (ALL-Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL-Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures.

View Article and Find Full Text PDF
Article Synopsis
  • The study investigates how arsenic exposure affects the mouse placenta using single-cell RNA sequencing to uncover changes in gene expression and function across various cell types.
  • A key finding is the significant upregulation of the Prap1 gene, which encodes a protein that appears to provide protective effects against arsenic toxicity, particularly in female placental cells.
  • The research highlights the potential for understanding how environmental toxins impact fetal development and suggests new strategies for preventing and treating related health issues in mothers and their children.
View Article and Find Full Text PDF

The KRAS mutation is the most common oncogenic driver in patients with non-small cell lung cancer (NSCLC). However, a detailed understanding of how self-reported race and/or ethnicity (SIRE), genetically inferred ancestry (GIA), and their interaction affect KRAS mutation is largely unknown. Here, we investigated the associations between SIRE, quantitative GIA, and KRAS mutation and its allele-specific subtypes in a multi-ethnic cohort of 3,918 patients from the Boston Lung Cancer Survival cohort and the Chinese OrigiMed cohort with an independent validation cohort of 1,450 patients with NSCLC.

View Article and Find Full Text PDF

Within population biobanks, incomplete measurement of certain traits limits the power for genetic discovery. Machine learning is increasingly used to impute the missing values from the available data. However, performing genome-wide association studies (GWAS) on imputed traits can introduce spurious associations, identifying genetic variants that are not associated with the original trait.

View Article and Find Full Text PDF
Article Synopsis
  • Inflammation biomarkers offer crucial insights into the inflammatory processes linked to various diseases, and their sequencing can help reveal the genetic makeup of these traits.
  • A study analyzed 21 inflammation biomarkers from around 38,465 individuals, discovering 22 significant associations across 6 inflammatory traits after considering existing findings.
  • The research combined single-variant and rare variant analyses, identifying additional significant associations and highlighting the complexity and diversity of genetic influences on inflammation traits across different ancestries.
View Article and Find Full Text PDF
Article Synopsis
  • - The text discusses the challenges of lung cancer and tobacco use, introducing a new approach called the GREAT care paradigm that utilizes polygenic risk scores (PRSs) for better risk assessment and personalized interventions in diverse patient populations.
  • - PRSs were developed using data from large-scale genetic studies and tested on over 561,000 individuals, revealing significant correlations between high PRS scores and increased odds of lung cancer and difficulty quitting smoking.
  • - The study aims to evaluate this PRS-based model in clinical trials, potentially enhancing prevention strategies and tobacco cessation efforts by incorporating genetic insights into primary care.
View Article and Find Full Text PDF
Article Synopsis
  • Epstein-Barr virus (EBV) and specific human leukocyte antigen (HLA) gene variations are important risk factors for nasopharyngeal carcinoma (NPC).
  • A study in southern China used a causal inference framework to analyze how these genetic factors and EBV interact to influence NPC risk.
  • Findings revealed strong interaction effects between high-risk EBV subtypes and certain HLA variations, suggesting that addressing these factors together could significantly reduce NPC risk.
View Article and Find Full Text PDF

Background: Although polygenic risk score (PRS) has emerged as a promising tool for predicting cancer risk from genome-wide association studies (GWAS), the individual-level accuracy of lung cancer PRS and the extent to which its impact on subsequent clinical applications remains largely unexplored.

Methods: Lung cancer PRSs and confidence/credible interval (CI) were constructed using two statistical approaches for each individual: (1) the weighted sum of 16 GWAS-derived significant SNP loci and the CI through the bootstrapping method (PRS-16-CV) and (2) LDpred2 and the CI through posteriors sampling (PRS-Bayes), among 17,166 lung cancer cases and 12,894 controls with European ancestry from the International Lung Cancer Consortium. Individuals were classified into different genetic risk subgroups based on the relationship between their own PRS mean/PRS CI and the population level threshold.

View Article and Find Full Text PDF

The COVID-19 pandemic influenced emotional experiences globally. We examined daily positive and negative affect between May/June 2020 and February 2021 (N = 151,049; 3,509,982 observations) using a convenience sample from a national mobile application-based survey that asked for daily affect reports. Four questions were examined: (1) How did people in the United States feel from May/June 2020 to February 2021?; (2) What demographic variables are related to positive and negative affect?; (3) What is the relationship between experienced stressors and daily affect?; and (4) What is the relationship between daily affect and preventive behavior? Positive affect increased, and negative decreased over time.

View Article and Find Full Text PDF

Existing SNP-heritability estimators that leverage summary statistics from genome-wide association studies (GWAS) are much less efficient (i.e., have larger standard errors) than the restricted maximum likelihood (REML) estimators which require access to individual-level data.

View Article and Find Full Text PDF

Background: Individuals with type 2 diabetes (T2D) have an increased risk of coronary artery disease (CAD), but questions remain about the underlying pathology. Identifying which CAD loci are modified by T2D in the development of subclinical atherosclerosis (coronary artery calcification [CAC], carotid intima-media thickness, or carotid plaque) may improve our understanding of the mechanisms leading to the increased CAD in T2D.

Methods: We compared the common and rare variant associations of known CAD loci from the literature on CAC, carotid intima-media thickness, and carotid plaque in up to 29 670 participants, including up to 24 157 normoglycemic controls and 5513 T2D cases leveraging whole-genome sequencing data from the Trans-Omics for Precision Medicine program.

View Article and Find Full Text PDF
Article Synopsis
  • Large-scale whole-genome sequencing (WGS) studies have enhanced our understanding of how rare genetic variants affect complex human traits through better analysis techniques.* -
  • Current methods for analyzing multiple traits are limited in their ability to handle rare variants in large WGS datasets, prompting the development of MultiSTAAR.* -
  • MultiSTAAR enables more powerful analysis by considering relatedness, population structure, and the correlation between traits, leading to the discovery of new genetic associations in lipid traits that single-trait analyses missed.*
View Article and Find Full Text PDF

Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L L penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage.

View Article and Find Full Text PDF
Article Synopsis
  • Long non-coding RNAs (lncRNAs) play crucial roles in regulating lipid metabolism and have been studied in relation to genetic variants and complex traits.
  • This research utilized high-coverage whole-genome sequencing of over 66,000 diverse participants to assess how rare variants in lncRNA genes affect blood lipid levels, using a statistical framework to analyze the associations.
  • The study found 83 lncRNA variants significantly linked to lipid levels, with many being independent of common genetic variations, and replicated a majority of these findings with data from another large cohort.
View Article and Find Full Text PDF

Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc.

View Article and Find Full Text PDF
Article Synopsis
  • Inflammation biomarkers play a crucial role in understanding diseases and can reveal insights into genetic traits through whole-genome sequencing studies.
  • A comprehensive analysis of 21 inflammation biomarkers in over 38,000 individuals found 22 significant single-variant associations across six different inflammatory traits, indicating the complexity and diversity of these biomarkers.
  • The study also included rare variant analyses, identifying 19 additional significant associations, which highlights the importance of using multiple analytical approaches to enhance the understanding of inflammation-related traits across different ancestries.
View Article and Find Full Text PDF
Article Synopsis
  • Obesity poses a significant public health challenge and is linked to high mortality rates, with prior studies focusing mostly on European populations.
  • This research utilized whole-genome sequencing data from a diverse group of 88,873 individuals, finding 18 new signals associated with body mass index (BMI) and highlighting a novel SNP prevalent among people of African descent.
  • The study emphasizes the importance of diverse genetic data in identifying new obesity-related variants, moving us closer to personalized medical interventions for this crisis.
View Article and Find Full Text PDF