Combining Partially Overlapping Multi-Omics Data in Databases Using Relationship Matrices.

Deniz Akdemir Ron Knox Julio Isidro Y Sánchez

Front Plant Sci

Agriculture & Food Science Centre, Animal and Crop Science Division, University College Dublin, Dublin, Ireland.

Published: July 2020

Private and public breeding programs, along with universities and companies, have generated large amounts of genomic sequence data, raising challenges in data management and analysis.
Detailed phenotype data and increasing genomic data present opportunities to enhance our understanding of quantitative genetics and facilitate research through data harmonization.
The paper proposes a covariance-based method for combining unbalanced omics data, demonstrating its potential in genomic prediction and improving insights into trait relationships from multiple phenotypic experiments.

Private and public breeding programs, as well as companies and universities, have developed different genomics technologies that have resulted in the generation of unprecedented amounts of sequence data, which bring new challenges in terms of data management, query, and analysis. The magnitude and complexity of these datasets bring new challenges but also an opportunity to use the data available as a whole. Detailed phenotype data, combined with increasing amounts of genomic data, have an enormous potential to accelerate the identification of key traits to improve our understanding of quantitative genetics. Data harmonization enables cross-national and international comparative research, facilitating the extraction of new scientific knowledge. In this paper, we address the complex issue of combining high dimensional and unbalanced omics data. More specifically, we propose a covariance-based method for combining partial datasets in the genotype to phenotype spectrum. This method can be used to combine partially overlapping relationship/covariance matrices. Here, we show with applications that our approach might be advantageous to feature imputation based approaches; we demonstrate how this method can be used in genomic prediction using heterogeneous marker data and also how to combine the data from multiple phenotypic experiments to make inferences about previously unobserved trait relationships. Our results demonstrate that it is possible to harmonize datasets to improve available information across gene-banks, data repositories, or other data resources.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7381228	PMC
http://dx.doi.org/10.3389/fpls.2020.00947	DOI Listing

Publication Analysis

Top Keywords

data

partially overlapping

bring challenges

combining partially

overlapping multi-omics

multi-omics data

data databases

databases relationship

relationship matrices

matrices private

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!