An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data.

BMC Genomics

Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, Yunnan, 666303, China.

Published: July 2015

AI Article Synopsis

  • Next-generation sequencing is producing vast genomic data, but analyzing it for phylogenetic reconstruction is challenging, especially for non-model organisms due to difficulties in genome assembly and alignment.
  • The new Assembly and Alignment-Free (AAF) method allows phylogenetic trees to be created directly from unassembled genome data, addressing issues like sequencing errors and incomplete data through sophisticated mathematical models.
  • Testing the AAF method on 12 mammal genomes and 21 tropical tree genomes demonstrates its ability to work effectively with low-coverage data, paving the way for phylogenomic studies in non-model species.

Article Abstract

Background: Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging.

Results: To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method ( https://sourceforge.net/projects/aaf-phylogeny ) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms.

Conclusion: Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4501066PMC
http://dx.doi.org/10.1186/s12864-015-1647-5DOI Listing

Publication Analysis

Top Keywords

assembly alignment-free
8
next-generation sequencing
8
aaf method
8
sequencing
6
genome
5
assembly
4
alignment-free method
4
method phylogeny
4
phylogeny reconstruction
4
reconstruction next-generation
4

Similar Publications

Background: Traditional supervised learning methods applied to DNA sequence taxonomic classification rely on the labor-intensive and time-consuming step of labelling the primary DNA sequences. Additionally, standard DNA classification/clustering methods involve time-intensive multiple sequence alignments, which impacts their applicability to large genomic datasets or distantly related organisms. These limitations indicate a need for robust, efficient, and scalable unsupervised DNA sequence clustering methods that do not depend on sequence labels or alignment.

View Article and Find Full Text PDF

k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes.

View Article and Find Full Text PDF
Article Synopsis
  • The human major histocompatibility complex (MHC) has high genetic diversity, making traditional reference-based alignment methods for DNA sequence assembly less effective.
  • MHConstructor is a new tool that uses a short-read, de novo assembly algorithm specifically for MHC data, allowing for improved assembly in large population studies.
  • This pipeline is unique in offering a reproducible, alignment-free method for analyzing MHC sequences, making it more accessible for researchers.
View Article and Find Full Text PDF

Intrinsically disordered regions (IDRs) are structurally flexible protein segments with regulatory functions in multiple contexts, such as in the assembly of biomolecular condensates. Since IDRs undergo more rapid evolution than ordered regions, identifying homology of such poorly conserved regions remains challenging for state-of-the-art alignment-based methods that rely on position-specific conservation of residues. Thus, systematic functional annotation and evolutionary analysis of IDRs have been limited, despite them comprising ~21% of proteins.

View Article and Find Full Text PDF

Studies of bacterial adaptation and evolution are hampered by the difficulty of measuring traits such as virulence, drug resistance, and transmissibility in large populations. In contrast, it is now feasible to obtain high-quality complete assemblies of many bacterial genomes thanks to scalable high-accuracy long-read sequencing technologies. To exploit this opportunity, we introduce a phenotype- and alignment-free method for discovering coselected and epistatically interacting genomic variation from genome assemblies covering both core and accessory parts of genomes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!