Publications by authors named "Alexey M Kozlov"

Article Synopsis
  • * Research shows that by adjusting specific numerical thresholds in tools like RAxML-NG and IQ-TREE, tree inference times can be significantly improved without compromising the accuracy of the resulting trees.
  • * The study provides access to datasets and scripts used in the research, allowing others to replicate or build upon their findings via the provided links.
View Article and Find Full Text PDF
Article Synopsis
  • Phylogenetic networks are used to model complex evolutionary scenarios, but existing methods struggle with high computational demands and small datasets, especially when considering incomplete lineage sorting (ILS).
  • The team introduces NetRAX, a maximum likelihood tool that simplifies network inference by avoiding ILS complications, utilizing efficient tree likelihood computations, and outputting results in Extended Newick format.
  • NetRAX performs well on simulated data, providing accurate network inference quickly, and is available for use under an open-source license on GitHub.
View Article and Find Full Text PDF
Article Synopsis
  • Phylogenetic inference is increasingly performed on powerful computing systems, leading to challenges in ensuring parallel fault tolerance for tools like RAxML-NG.
  • The study explores necessary software modifications to achieve fault tolerance, revealing that the added recovery mechanisms lead to a minimal slow down of about 1.7 times on large datasets, but ensure reliable recovery from multiple types of failures.
  • The updated RAxML-NG with fault tolerance is publicly available for use, promoting easier access to improved phylegenetic analysis methods.
View Article and Find Full Text PDF
Article Synopsis
  • Many daily publications focus on analyzing SARS-CoV-2 data, including phylogenetic studies available on nextstrain.org.
  • The authors discuss challenges in creating reliable phylogenies due to a high number of virus sequences but a low number of mutations, making it tough to draw clear evolutionary connections.
  • They conclude that while phylogenetic methods can offer some insights into COVID-19's evolution and spread, researchers should interpret results with caution, especially when using standard analysis tools.
View Article and Find Full Text PDF
Article Synopsis
  • Inferring phylogenetic trees for individual gene families is challenging due to short alignments and ineffective substitution models, so methods that incorporate species tree information are necessary.
  • GeneRax is introduced as the first maximum likelihood software that considers both sequence-level and gene-level events, such as duplications and transfers, for phylogenetic inference.
  • In simulations, GeneRax accurately infers trees in 90% of cases and is the fastest method for empirical data, effectively performing large-scale analyses, as demonstrated by its ability to process 1,099 Cyanobacteria families in just 8 minutes.
View Article and Find Full Text PDF
Article Synopsis
  • Advances in microfluidics and low sequencing costs have revolutionized single-cell sequencing technology, allowing for the analysis of thousands to millions of cells in one experiment.
  • This rapid data generation presents unique challenges in data science, which the text identifies as central to the future of single-cell biology.
  • The article provides an overview of eleven key challenges, including motivating research questions and open problems, making it relevant for both experienced researchers and newcomers to the field.
View Article and Find Full Text PDF
Article Synopsis
  • Researchers created a faster and more memory-efficient version of the transfer bootstrap expectation (TBE) method for phylogenetic analysis, addressing limitations of the original, resource-heavy tool.
  • Their new implementation can be up to 480 times quicker and uses significantly less memory, making it better for large datasets.
  • This optimized TBE method has been integrated into existing tools and is available for public use under an open-source license.
View Article and Find Full Text PDF
Article Synopsis
  • ModelTest-NG is a new tool designed from the ground up to improve upon jModelTest and ProtTest, which are used for selecting model substitutions for nucleotide and amino acid sequences.
  • It is significantly faster—one to two orders of magnitude—than its predecessors while maintaining equal accuracy and adds new features like bias correction and automatic processing.
  • ModelTest-NG is open source and can be accessed under the GNU GPL3 license at its GitHub page.
View Article and Find Full Text PDF
Article Synopsis
  • Researchers developed a new model for amino acid sequence evolution that incorporates protein structure, which is often overlooked despite its importance.
  • This "structurally aware" model uses an expanded alphabet to describe amino acids along with their side-chain configurations, taking into account geometric patterns and dihedral angles.
  • The new model outperforms traditional models in estimating evolutionary divergence and reconstructing ancestral states, highlighting the significance of side-chain geometry for understanding protein folding and function in evolutionary biology.
View Article and Find Full Text PDF
Article Synopsis
  • Phylogenies are crucial for biological research and have applications in fields like biotechnology and medicine, but finding optimal trees using maximum likelihood methods is computationally challenging.
  • RAxML-NG is a new implementation that improves upon the previous RAxML/ExaML algorithm, offering better accuracy, speed, and new features while analyzing complex datasets.
  • The RAxML-NG code is accessible under GNU GPL, with a web service available for users, and additional supplementary data is provided online.
View Article and Find Full Text PDF
Article Synopsis
  • Hemipteroid insects, which make up over 10% of insect diversity, are important in ecosystems but their evolutionary relationships have been unclear in past studies.
  • Recent phylogenomic analyses of 193 hemipteroid insect samples offer a clearer phylogeny, confirming the monophyly of the three main orders: Psocodea, Thysanoptera, and Hemiptera, and suggesting Thysanoptera is closely related to Hemiptera.
  • The study also indicates that hemipteroid insects began diversifying over 365 million years ago and discusses the impact of these findings on understanding insect evolution and traits.
View Article and Find Full Text PDF
Article Synopsis
  • Coalescent- and reconciliation-based methods are increasingly used to infer species phylogenies from genomic data, but current tools are inefficient for processing large multiple sequence alignments (MSAs).
  • The new tool, ParGenes, allows researchers to simultaneously conduct model testing and maximum likelihood inference on thousands of MSAs, significantly speeding up the analysis.
  • ParGenes has been tested successfully, processing over 20,000 phylogenetic gene trees in just 28 hours on a supercomputing cluster, and is available for public use on GitHub.
View Article and Find Full Text PDF
Article Synopsis
  • Next generation sequencing (NGS) technologies have resulted in an overwhelming amount of molecular sequence data, particularly complicating the field of metagenetics which deals with identifying sequences from various microbial environments.
  • Traditional phylogenetic placement methods, like EPA and PPLACER, struggle with scalability due to advancements in NGS, prompting the development of a faster, more efficient tool called EPA-NG.
  • EPA-NG can operate on both shared and distributed memory systems, showcasing impressive performance by processing 1 billion metagenetic reads within 7 hours on a computing cluster, while significantly outpacing previous algorithms.
View Article and Find Full Text PDF
Article Synopsis
  • Creating a comprehensive and sustainable plant tree of life is becoming possible but faces challenges due to issues with current data integration and accessibility for non-experts.* -
  • Existing phylogenetic trees are often static and quickly outdated, highlighting the need for a collaborative and adaptable framework for integrating DNA data and conducting phylogenetic analyses.* -
  • The scientific community should focus on developing user-friendly interfaces for data access, regular updates of phylogenetic trees, and enhancing data quality through user feedback to achieve effective global phylogenetic synthesis.*
View Article and Find Full Text PDF
Article Synopsis
  • - Public databases often contain molecular sequences annotated by the original authors, leading to potential mislabeling and errors that are difficult to detect, which can negatively impact metagenetic studies.
  • - The research introduces SATIVA, a method that uses phylogenetic analysis to automatically identify and correct taxonomically mislabeled sequences, achieving high accuracy rates in both identification and correction.
  • - Analysis of popular microbial reference databases reveals a significant presence of mislabels, ranging from 0.2% to 2.5%, and SATIVA provides a tool for exploring better taxonomic classifications, specifically for Cyanobacteria.
View Article and Find Full Text PDF
Article Synopsis
  • Phylogenetic analysis is becoming crucial in medical and biological research, fueled by the rapid growth of datasets from next-generation sequencing.
  • ExaML version 3 has been developed as a high-performance tool for inferring phylogenies from large datasets, including whole-transcriptome and whole-genome alignments, utilizing supercomputers.
  • The latest version introduces enhancements like new substitution models, a novel load balance algorithm for better performance, and is optimized for Intel MIC-based hardware platforms.
View Article and Find Full Text PDF