We introduce the 12th version of the Molecular Evolutionary Genetics Analysis (MEGA) software. This latest version brings many significant improvements by reducing the computational time needed for selecting optimal substitution models and conducting bootstrap tests on phylogenies using maximum likelihood (ML) methods. These improvements are achieved by implementing heuristics that minimize likely unnecessary computations.
View Article and Find Full Text PDFPhylogenomic analyses of long sequences, consisting of many genes and genomic segments, reconstruct organismal relationships with high statistical confidence. But, inferred relationships can be sensitive to excluding just a few sequences. Currently, there is no direct way to identify fragile relationships and the associated individual gene sequences in species.
View Article and Find Full Text PDFPhylogenomic analyses of long sequences, consisting of many genes and genomic segments, infer organismal relationships with high statistical confidence. But, these relationships can be sensitive to excluding just a few sequences. Currently, there is no direct way to identify fragile relationships and the associated individual gene sequences in species.
View Article and Find Full Text PDFAn individual's chronological age does not always correspond to the health of different tissues in their body, especially in cases of disease. Therefore, estimating and contrasting the physiological age of tissues with an individual's chronological age may be a useful tool to diagnose disease and its progression. In this study, we present novel metrics to quantify the loss of phylogenetic diversity in hematopoietic stem cells (HSCs), which are precursors to most blood cell types and are associated with many blood-related diseases.
View Article and Find Full Text PDFA common practice in molecular systematics is to infer phylogeny and then scale it to time by using a relaxed clock method and calibrations. This sequential analysis practice ignores the effect of phylogenetic uncertainty on divergence time estimates and their confidence/credibility intervals. An alternative is to infer phylogeny and times jointly to incorporate phylogenetic errors into molecular dating.
View Article and Find Full Text PDFThe selection of the optimal substitution model of molecular evolution imposes a high computational burden for long sequence alignments in phylogenomics. We discovered that the analysis of multiple tiny subsamples of site patterns from a full sequence alignment recovers the correct optimal substitution model when sites in the subsample are upsampled to match the total number of sites in the full alignment. The computational costs of maximum-likelihood analyses are reduced by orders of magnitude in the subsample-upsample (SU) approach because the upsampled alignment contains only a small fraction of all site patterns.
View Article and Find Full Text PDFNat Comput Sci
September 2021
Felsenstein's bootstrap approach is widely used to assess confidence in species relationships inferred from multiple sequence alignments. It resamples sites randomly with replacement to build alignment replicates of the same size as the original alignment and infers a phylogeny from each replicate dataset. The proportion of phylogenies recovering the same grouping of species is its bootstrap confidence limit.
View Article and Find Full Text PDFWe introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci-such as genes, proteins, genomic segments, and positions-as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait.
View Article and Find Full Text PDFGlobal sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient.
View Article and Find Full Text PDFWe report the likely most recent common ancestor of SARS-CoV-2 - the coronavirus that causes COVID-19. This progenitor SARS-CoV-2 genome was recovered through a novel application and advancement of computational methods initially developed to reconstruct the mutational history of tumor cells in a patient. The progenitor differs from the earliest coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections.
View Article and Find Full Text PDF