Analyzing human genomic data from biobanks and large-scale genetic evaluations often requires fitting models with a sample size exceeding the number of DNA markers used (n > p). For instance, developing Polygenic Scores (PGS) for humans and genomic prediction for genetic evaluations of agricultural species may require fitting models involving a few thousand SNPs using data with hundreds of thousands of samples. In such cases, computations based on sufficient statistics are more efficient than those based on individual genotype-phenotype data. Additionally, software that admits sufficient statistics as inputs can be used to analyze data from multiple sources jointly without the need to share individual genotype-phenotype data. Therefore, we developed functionality within the BGLR R-package that generates posterior samples for Bayesian shrinkage and variable selection models from sufficient statistics. In this article, we present an overview of the new methods incorporated in the BGLR R-package, demonstrate the use of the new software through simple examples, provide several computational benchmarks, and present a real-data example using data from the UK-Biobank, All of Us, and the HCHS/SOL cohort demonstrating how a joint analysis from multiple cohorts can be implemented without sharing individual genotype-phenotype data, and how a combined analysis can improve the prediction accuracy of PGS for Hispanics--a group severely underrepresented in GWAS data.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/g3journal/jkae288 | DOI Listing |
G3 (Bethesda)
December 2024
Departments of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA.
Analyzing human genomic data from biobanks and large-scale genetic evaluations often requires fitting models with a sample size exceeding the number of DNA markers used (n > p). For instance, developing Polygenic Scores (PGS) for humans and genomic prediction for genetic evaluations of agricultural species may require fitting models involving a few thousand SNPs using data with hundreds of thousands of samples. In such cases, computations based on sufficient statistics are more efficient than those based on individual genotype-phenotype data.
View Article and Find Full Text PDFGenetics
August 2022
Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA.
The BGLR-R package implements various types of single-trait shrinkage/variable selection Bayesian regressions. The package was first released in 2014, since then it has become a software very often used in genomic studies. We recently develop functionality for multitrait models.
View Article and Find Full Text PDFG3 (Bethesda)
August 2018
Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil.
One of the major issues in plant breeding is the occurrence of genotype × environment (GE) interaction. Several models have been created to understand this phenomenon and explore it. In the genomic era, several models were employed to improve selection by using markers and account for GE interaction simultaneously.
View Article and Find Full Text PDFJ Dairy Sci
March 2018
Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro PD, Italy. Electronic address:
Data on Holstein (16,890), Brown Swiss (31,441), Simmental (25,845), and Alpine Grey (12,535) cows reared in northeastern Italy were used to assess the ability of milk components (fat, protein, casein, and lactose) and Fourier transform infrared (FTIR) spectral data to diagnose pregnancy. Pregnancy status was defined as whether a pregnancy was confirmed by a subsequent calving and no other subsequent inseminations within 90 d of the breeding of specific interest. Milk samples were analyzed for components and FTIR full-spectrum data using a MilkoScan FT+ 6000 (Foss Electric, Hillerød, Denmark).
View Article and Find Full Text PDFJ Dairy Sci
November 2015
Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell'Università 16, 35020 Legnaro, Italy.
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!