Analyzing human genomic data from biobanks and large-scale genetic evaluations often requires fitting models with a sample size exceeding the number of DNA markers used (n > p). For instance, developing Polygenic Scores (PGS) for humans and genomic prediction for genetic evaluations of agricultural species may require fitting models involving a few thousand SNPs using data with hundreds of thousands of samples. In such cases, computations based on sufficient statistics are more efficient than those based on individual genotype-phenotype data.
View Article and Find Full Text PDFThe overwhelming majority of participants in genome-wide association studies (GWAS) have European (EUR) ancestry, and polygenic scores (PGS) derived from EURs often perform poorly in non-EURs. Previous studies suggest that between-ancestry differences in allele frequencies and linkage disequilibrium are significant contributors to the poor portability of PGS in cross-ancestry prediction. We hypothesize that the portability of (local) PGS varies significantly over the genome.
View Article and Find Full Text PDFPredicting phenotypes from a combination of genetic and environmental factors is a grand challenge of modern biology. Slight improvements in this area have the potential to save lives, improve food and fuel security, permit better care of the planet, and create other positive outcomes. In 2022 and 2023 the first open-to-the-public Genomes to Fields (G2F) initiative Genotype by Environment (GxE) prediction competition was held using a large dataset including genomic variation, phenotype and weather measurements and field management notes, gathered by the project over nine years.
View Article and Find Full Text PDFVariable selection and large-scale hypothesis testing are techniques commonly used to analyze high-dimensional genomic data. Despite recent advances in theory and methodology, variable selection and inference with highly collinear features remain challenging. For instance, collinearity poses a great challenge in genome-wide association studies involving millions of variants, many of which may be in high linkage disequilibrium.
View Article and Find Full Text PDFPopulation studies have shown that the infant's microbiome and metabolome undergo significant changes in early childhood. However, no previous study has investigated how diverse these changes are across subjects and whether the subject-specific dynamics of some microbes correlate with the over-time dynamics of specific metabolites. Using mixed-effects models, and data from the ABC study, we investigated the early childhood dynamics of fecal microbiome and metabolome and identified 83 amplicon sequence variants (ASVs) and 753 metabolites with seemingly coordinated trajectories.
View Article and Find Full Text PDFMagnetoreceptive biology as a field remains relatively obscure; compared with the breadth of species believed to sense magnetic fields, it remains under-studied. Here, we present grounds for the expansion of magnetoreception studies among teleosts. We begin with the electromagnetic perceptive gene (EPG) from and expand to identify 72 teleosts with homologous proteins containing a conserved three-phenylalanine (3F) motif.
View Article and Find Full Text PDF: Lipid metabolism plays an important role in maternal health and fetal development. There is a gap in the knowledge of how lipid metabolism changes during pregnancy for Black women who are at a higher risk of adverse outcomes. We hypothesized that the comprehensive lipidome profiles would show variation across pregnancy indicative of requirements during gestation and fetal development.
View Article and Find Full Text PDFMagnetoreceptive biology as a field remains relatively obscure; compared to the breadth of species believed to sense magnetic fields, it remains under-studied. Here, we present grounds for the expansion of magnetoreception studies among Teleosts. We begin with the electromagnetic perceptive gene (EPG) from and expand to identify 72 Teleosts with homologous proteins containing a conserved three-phenylalanine (3F) motif.
View Article and Find Full Text PDFMany genetic models (including models for epistatic effects as well as genetic-by-environment) involve covariance structures that are Hadamard products of lower rank matrices. Implementing these models requires factorizing large Hadamard product matrices. The available algorithms for factorization do not scale well for big data, making the use of some of these models not feasible with large sample sizes.
View Article and Find Full Text PDFThe human brain grows quickly during infancy and early childhood, but factors influencing brain maturation in this period remain poorly understood. To address this gap, we harmonized data from eight diverse cohorts, creating one of the largest pediatric neuroimaging datasets to date focused on birth to 6 years of age. We mapped the developmental trajectory of intracranial and subcortical volumes in ∼2,000 children and studied how sociodemographic factors and adverse birth outcomes influence brain structure and cognition.
View Article and Find Full Text PDFGenotype-by-environment (G×E) interactions can significantly affect crop performance and stability. Investigating G×E requires extensive data sets with diverse cultivars tested over multiple locations and years. The Genomes-to-Fields (G2F) Initiative has tested maize hybrids in more than 130 year-locations in North America since 2014.
View Article and Find Full Text PDFInformation on dry matter intake (DMI) and energy balance (EB) at the animal and herd level is important for management and breeding decisions. However, routine recording of these traits at commercial farms can be challenging and costly. Fourier-transform mid-infrared (FT-MIR) spectroscopy is a noninvasive technique applicable to a large cohort of animals that is routinely used to analyze milk components and is convenient for predicting complex phenotypes that are typically difficult and expensive to obtain on a large scale.
View Article and Find Full Text PDFIt is increasingly assumed that there is no one-size-fits-all approach to dietary recommendations for the management and treatment of chronic diseases such as obesity. This phenomenon that not all individuals respond uniformly to a given treatment has become an area of research interest given the rise of personalized and precision medicine. To conduct, interpret, and disseminate this research rigorously and with scientific accuracy, however, requires an understanding of treatment response heterogeneity.
View Article and Find Full Text PDFBackground: Most genomic prediction applications in animal breeding use genotypes with tens of thousands of single nucleotide polymorphisms (SNPs). However, modern sequencing technologies and imputation algorithms can generate ultra-high-density genotypes (including millions of SNPs) at an affordable cost. Empirical studies have not produced clear evidence that using ultra-high-density genotypes can significantly improve prediction accuracy.
View Article and Find Full Text PDFObjectives: The Genomes to Fields (G2F) 2022 Maize Genotype by Environment (GxE) Prediction Competition aimed to develop models for predicting grain yield for the 2022 Maize GxE project field trials, leveraging the datasets previously generated by this project and other publicly available data.
Data Description: This resource used data from the Maize GxE project within the G2F Initiative [1]. The dataset included phenotypic and genotypic data of the hybrids evaluated in 45 locations from 2014 to 2022.
Mendelian randomization (MR) has become a common tool used in epidemiological studies. However, when confounding variables are correlated with the instrumental variable (in this case, a genetic/variant/marker), the estimation can remain biased even with MR. We propose conditioning on parental mating types (a function of parental genotypes) in MR to eliminate the need for one set of assumptions, thereby plausibly reducing such bias.
View Article and Find Full Text PDFWhole-genome multi-omics profiles contain valuable information for the characterization and prediction of complex traits in plants. In this study, we evaluate multi-omics models to predict four complex traits in barley (); grain yield, thousand kernel weight, protein content, and nitrogen uptake. Genomic, transcriptomic, and DNA methylation data were obtained from 75 spring barley lines tested in the RadiMax semi-field phenomics facility under control and water-scarce treatment.
View Article and Find Full Text PDFThe BGLR-R package implements various types of single-trait shrinkage/variable selection Bayesian regressions. The package was first released in 2014, since then it has become a software very often used in genomic studies. We recently develop functionality for multitrait models.
View Article and Find Full Text PDFHyperuricemia (serum urate >6.8 mg/dl) is associated with several cardiometabolic and renal diseases, such as gout and chronic kidney disease. Previous studies have examined the shared genetic basis of chronic kidney disease and hyperuricemia in humans either using single-variant tests or estimating whole-genome genetic correlations between the traits.
View Article and Find Full Text PDFModern GWAS studies use an enormous sample size and ultra-high density SNP genotypes. These conditions reduce the mapping resolution of marginal association tests-the method most often used in GWAS. Multi-locus Bayesian Variable Selection (BVS) offers a one-stop solution for powerful and precise mapping of risk variants and polygenic risk score (PRS) prediction.
View Article and Find Full Text PDFThe average American consumes more than 50% of their total dietary energy from ultra-processed foods (UPFs). From a nutritional standpoint, as UPFs intake increases, fiber, vitamin, and mineral intake decrease. High consumption of UPFs, mainly from fast foods (FF) and ready-to-eat (RTE) food items, emerges as a critical public health concern linking nutritional quality and food safety.
View Article and Find Full Text PDFIntroduction: The infection fatality rate (IFR) of COVID-19 has been carefully measured and analysed in high-income countries, whereas there has been no systematic analysis of age-specific seroprevalence or IFR for developing countries.
Methods: We systematically reviewed the literature to identify all COVID-19 serology studies in developing countries that were conducted using representative samples collected by February 2021. For each of the antibody assays used in these serology studies, we identified data on assay characteristics, including the extent of seroreversion over time.