Phylogenomics with paralogs.

Marc Hellmuth Nicolas Wieseke Marcus Lechner Hans-Peter Lenhof Martin Middendorf Peter F Stadler

Proc Natl Acad Sci U S A

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center of Bioinformatics, Leipzig University, D-04107 Leipzig, Germany; Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany; Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany; Institute for Theoretical Chemistry, University of Vienna, A-1090 Vienna, Austria; Center for Non-Coding RNA in Technology and Health, University of Copenhagen, 1870 Frederiksberg C, Denmark; and Santa Fe Institute, Santa Fe, NM 87501.

Published: February 2015

Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4343152	PMC
http://dx.doi.org/10.1073/pnas.1412770112	DOI Listing

Publication Analysis

Top Keywords

data sets

phylogenetic trees

gene trees

gene

trees

phylogenomics paralogs

paralogs phylogenomics

phylogenomics heavily

heavily relies

relies well-curated

Similar Publications

Abbreviated Duration of Vasoactive Agents Has Similar Outcomes as Standard Duration of Therapy in Patients with Liver Cirrhosis and Variceal Bleeding: An Individual Patient Data Meta-Analysis.

Dig Dis Sci

January 2025

Department of Gastroenterology and Human Nutrition Unit, All India Institute of Medical Sciences, New Delhi, India.

Sagnik Biswas Gin-Ho Lo Shubham Mehta Anshuman Elhence Yu Jun Wong

Background: This two-stage individual patient data meta-analysis (IPD-MA) compared the efficacy of a shorter duration (≤ 2 days) of vasoactive (VA) drug therapy to standard duration (3-5 days) after acute variceal bleeding (AVB) in patients with liver cirrhosis.

Patients And Methods: Randomized clinical trials on patients with cirrhosis and AVB undergoing endoscopic band ligation which compared a short duration versus the standard duration of VA therapy were included. The primary outcome was 5-day rebleeding rate.

View Article and Find Full Text PDF

Similar Publications

Insights Into Causal Effects of Genetically Proxied Lipids and Lipid-Modifying Drug Targets on Cardiometabolic Diseases.

J Am Heart Assoc

January 2025

Center for Non-Communicable Disease Management Beijing Children's Hospital, Capital Medical University, National Center for Children's Health Beijing China.

Liwan Fu Qin Liu Hong Cheng Xiaoyuan Zhao Jingfan Xiong

Background: The differential impact of serum lipids and their targets for lipid modification on cardiometabolic disease risk is debated. This study used Mendelian randomization to investigate the causal relationships and underlying mechanisms.

Methods: Genetic variants related to lipid profiles and targets for lipid modification were sourced from the Global Lipids Genetics Consortium.

View Article and Find Full Text PDF

Similar Publications

[Development of prognostic clinical and genetic models of the risk of low bone mineral density using neural network training].

Probl Endokrinol (Mosk)

January 2024

Endocrinology Research Centre; Institute of Biochemistry and Genetics-Subdivision of the Ufa Federal Research Centre of the Russian Academy of Sciences.

B I Yalaev A V Novikov I R Minniakhmetov R I Khusainova

Background: Osteoporosis is a common age-related disease with disabling consequences, the early diagnosis of which is difficult due to its long and hidden course, which often leads to diagnosis only after a fracture. In this regard, great expectations are placed on advanced developments in machine learning technologies aimed at predicting osteoporosis at an early stage of development, including the use of large data sets containing information on genetic and clinical predictors of the disease. Nevertheless, the inclusion of DNA markers in prediction models is fraught with a number of difficulties due to the complex polygenic and heterogeneous nature of the disease.

View Article and Find Full Text PDF

Similar Publications

Single-Walled Carbon Nanotube Probes for Protease Characterization Directly in Cell-Free Expression Reactions.

bioRxiv

January 2025

Chemical and Biological Engineering - Iowa State University, 618 Bissell Rd, Ames, IA 50011.

Sepehr Hejazi Ryan Godin Vito Jurasic Nigel F Reuel

Proteins can be rapidly prototyped with cell-free expression (CFE) but in most cases there is a lack of probes or assays to measure their function directly in the cell lysate, thereby limiting the throughput of these screens. Increased throughput is needed to build standardized, sequence to function data sets to feed machine learning guided protein optimization. Herein, we describe the use of fluorescent single-walled carbon nanotubes (SWCNT) as effective probes for measuring protease activity directly in cell-free lysate.

View Article and Find Full Text PDF

Similar Publications

Fully Synthetic Data for Complex Surveys.

Surv Methodol

December 2024

Department of Statistical Science, 214a Old Chemistry Building, Duke University, Durham, NC 27708-0251.

Shirley Mathur Yajuan Si Jerome P Reiter

When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our approach adheres to the general strategy proposed by Rubin (1993).

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!