Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric, that is they do not reproduce the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. On each of three high-dimensional omics datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9974, and 0.9902, respectively. We thus demonstrate, for high-dimensional compositional data, that additive logratios can provide a valid choice as transformed variables, which (a) are subcompositionally coherent, (b) explain 100% of the total logratio variance and (c) come measurably very close to being isometric. The interpretation of additive logratios is much simpler than the complex isometric alternatives and, when the variance of the log-transformed reference is very low, it is even simpler since each additive logratio can be identified with a corresponding compositional component.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8561721PMC
http://dx.doi.org/10.3389/fmicb.2021.727398DOI Listing

Publication Analysis

Top Keywords

additive logratios
20
compositional data
16
additive logratio
16
logratio transformation
12
additive
9
data analysis
8
omics datasets
8
reference component
8
procrustes correlation
8
logratio geometry
8

Similar Publications

Three approaches to supervised learning for compositional data with pairwise logratios.

J Appl Stat

August 2022

Department of Economics and Business and Barcelona School of Management, Universitat Pompeu Fabra, Barcelona, Spain.

Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in compositional data analysis, and include the well-known additive logratios as particular cases. When the number of parts is large (sometimes even larger than the number of cases), some form of logratio selection is needed. In this article, we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem.

View Article and Find Full Text PDF

Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant.

View Article and Find Full Text PDF

Improving high-resolution copy number variation analysis from next generation sequencing using unique molecular identifiers.

BMC Bioinformatics

March 2021

INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.

Article Synopsis
  • Copy number variations (CNV) are crucial genetic alterations linked to cancer, affecting oncogenes and tumor suppressors; new sequencing techniques using unique molecular identifiers (UMI) enhance CNV detection.
  • The study introduces a novel method called molecular Copy Number Alteration (mCNA), which employs UMI and a four-step algorithm to accurately identify copy number changes in cancer samples.
  • mCNA has shown strong correlation with existing genomic methods and is made publicly available, promising improved accuracy in detecting CNV changes in cancer research.
View Article and Find Full Text PDF

Correction of bias in self-reported sitting time among office workers - a study based on compositional data analysis.

Scand J Work Environ Health

January 2020

Department of Public and Occupational Health, Amsterdam UMC, location VUmc, van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands.

Objective Emerging evidence suggests that excessive sitting has negative health effects. However, this evidence largely relies on research using self-reported sitting time, which is known to be biased. To correct this bias, we aimed at developing a calibration model estimating "true" sitting from self-reported sitting.

View Article and Find Full Text PDF

Graphics for relatedness research.

Mol Ecol Resour

November 2017

Department of Computer Science, Applied Mathematics and Statistics, Universitat de Girona, Girona, Spain.

Studies of relatedness have been crucial in molecular ecology over the last decades. Good evidence of this is the fact that studies of population structure, evolution of social behaviours, genetic diversity and quantitative genetics all involve relatedness research. The main aim of this article was to review the most common graphical methods used in allele sharing studies for detecting and identifying family relationships.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!