Genome Data Exploration Using Correspondence Analysis.

Bioinform Biol Insights

Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France.

Published: June 2016

Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results, particularly by the ability of relating individual patterns with their corresponding characteristic variables.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4898644PMC
http://dx.doi.org/10.4137/BBI.S39614DOI Listing

Publication Analysis

Top Keywords

data
9
correspondence analysis
8
genotyping data
8
large data
8
data tables
8
high-dimensional data
8
viruses phages
8
fungi well
8
shared orthologs
8
yeast fungal
8

Similar Publications

Outcomes With Radiation Therapy as Primary Treatment for Unresectable Cutaneous Head and Neck Squamous Cell Carcinoma.

Clin Oncol (R Coll Radiol)

December 2024

Radiation Oncology Network, Westmead Hospital, Westmead, NSW, Australia; Sydney Medical School, The University of Sydney, Camperdown, NSW 2006, Australia. Electronic address:

Aims: Unresectable cutaneous squamous cell cancer of the head and neck (HNcSCC) poses treatment challenges in elderly and comorbid patients. Radiation therapy (RT) is often employed for locoregional control. This study aimed to determine progression-free survival (PFS) and overall survival (OS) outcomes achieved with upfront RT in unresectable HNcSCC.

View Article and Find Full Text PDF

Objective: Discussions related to the importance of seeking specific consent for sensitive (e.g., pelvic, rectal) exams performed on anesthetized patients by medical students have been growing.

View Article and Find Full Text PDF

Who is coming in? Evaluation of physician performance within multi-physician emergency departments.

Am J Emerg Med

January 2025

Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, USA; Center for Outcomes Research and Evaluation, Yale University, New Haven, CT, USA.

Background: This study aimed to examine how physician performance metrics are affected by the speed of other attendings (co-attendings) concurrently staffing the ED.

Methods: A retrospective study was conducted using patient data from two EDs between January-2018 and February-2020. Machine learning was used to predict patient length of stay (LOS) conditional on being assigned a physician of average speed, using patient- and departmental-level variables.

View Article and Find Full Text PDF

National early warning score 2 plus non-invasive capnography and perfusion index to estimate poor outcomes in emergency departments.

Am J Emerg Med

January 2025

Faculty of Medicine, Universidad de Valladolid, Valladolid, Spain; Emergency Department, Hospital Clínico Universitario, Gerencia Regional de Salud de Castilla y León, Valladolid, Spain.

Background: The study of the inclusion of new variables in already existing early warning scores is a growing field. The aim of this work was to determine how capnometry measurements, in the form of end-tidal CO2 (ETCO2) and the perfusion index (PI), could improve the National Early Warning Score (NEWS2).

Methods: A secondary, prospective, multicenter, cohort study was undertaken in adult patients with unselected acute diseases who needed continuous monitoring in the emergency department (ED), involving two tertiary hospitals in Spain from October 1, 2022, to June 30, 2023.

View Article and Find Full Text PDF

Mild cognitive impairment (MCI) is a significant predictor of the early progression of Alzheimer's disease, and it can be used as an important indicator of disease progression. However, many existing methods focus mainly on the image itself when processing brain imaging data, ignoring other non-imaging data (e.g.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!