Background: A central problem of computational metagenomics is determining the correct placement into an existing phylogenetic tree of individual reads (nucleotide sequences of varying lengths, ranging from hundreds to thousands of bases) obtained using next-generation sequencing of DNA samples from a mixture of known and unknown species. Correct placement allows us to easily identify or classify the sequences in the sample as to taxonomic position or function.
Results: Here we propose a novel method (PhyClass), based on the Minimum Evolution (ME) phylogenetic inference criterion, for determining the appropriate phylogenetic position of each read.
We present a procedure to test the effect of calibration priors on estimated times, which applies a recently developed calibration-free approach (RelTime) method that produces relative divergence times for all nodes in the tree. We illustrate this protocol by applying it to a timetree of metazoan diversification (Erwin DH, Laflamme M, Tweedt SM, Sperling EA, Pisani D, Peterson KJ. 2011.
View Article and Find Full Text PDFScientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness of the species-gene matrix on the accuracy of divergence time estimates.
View Article and Find Full Text PDFWe announce the release of an advanced version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which currently contains facilities for building sequence alignments, inferring phylogenetic histories, and conducting molecular evolutionary analysis. In version 6.0, MEGA now enables the inference of timetrees, as it implements the RelTime method for estimating divergence times for all branching points in a phylogeny.
View Article and Find Full Text PDFMolecular dating of species divergences has become an important means to add a temporal dimension to the Tree of Life. Increasingly larger datasets encompassing greater taxonomic diversity are becoming available to generate molecular timetrees by using sophisticated methods that model rate variation among lineages. However, the practical application of these methods is challenging because of the exorbitant calculation times required by current methods for contemporary data sizes, the difficulty in correctly modeling the rate heterogeneity in highly diverse taxonomic groups, and the lack of reliable clock calibrations and their uncertainty distributions for most groups of species.
View Article and Find Full Text PDFPhylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value).
View Article and Find Full Text PDFModern technologies have made the sequencing of personal genomes routine. They have revealed thousands of nonsynonymous (amino acid altering) single nucleotide variants (nSNVs) of protein-coding DNA per genome. What do these variants foretell about an individual's predisposition to diseases? The experimental technologies required to carry out such evaluations at a genomic scale are not yet available.
View Article and Find Full Text PDFThe rapid expansion of sequence data and the development of statistical approaches that embrace varying evolutionary rates among lineages have encouraged many more investigators to use DNA and protein data to time species divergences. Here, we report results from a systematic evaluation, by means of computer simulation, of the performance of two frequently used relaxed-clock methods for estimating these times and their credibility intervals (CrIs). These relaxed-clock methods allow rates to vary in a phylogeny randomly over lineages (e.
View Article and Find Full Text PDFAs the cost of DNA sequencing drops, we are moving beyond one genome per species to one genome per individual to improve prevention, diagnosis, and treatment of disease by using personal genotypes. Computational methods are frequently applied to predict impairment of gene function by nonsynonymous mutations in individual genomes and single nucleotide polymorphisms (nSNPs) in populations. These computational tools are, however, known to fail 15%-40% of the time.
View Article and Find Full Text PDFComparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution.
View Article and Find Full Text PDFDNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
December 2005
Molecular clocks have been used to date the divergence of humans and chimpanzees for nearly four decades. Nonetheless, this date and its confidence interval remain to be firmly established. In an effort to generate a genomic view of the human-chimpanzee divergence, we have analyzed 167 nuclear protein-coding genes and built a reliable confidence interval around the calculated time by applying a multifactor bootstrap-resampling approach.
View Article and Find Full Text PDFBased on published information, we have identified 991 genes and gene-family clusters for cattle and 764 for pigs that have orthologues in the human genome. The relative linear locations of these genes on human sequence maps were used as "rulers" to annotate bovine and porcine genomes based on a CSAM (contiguous sets of autosomal markers) approach. A CSAM is an uninterrupted set of markers in one genome (primary genome; the human genome in this study) that is syntenic in the other genome (secondary genome; the bovine and porcine genomes in this study).
View Article and Find Full Text PDF