Evaluation of various distance computation methods for construction of haplotype-based phylogenies from large MLST datasets.

Mol Phylogenet Evol

Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA. Electronic address:

Published: December 2022

Multi-locus sequence typing (MLST) is widely used to investigate genetic relationships among eukaryotic taxa, including parasitic pathogens. MLST analysis workflows typically involve construction of alignment-based phylogenetic trees - i.e., where tree structures are computed from nucleotide differences observed in a multiple sequence alignment (MSA). Notably, alignment-based phylogenetic methods require that all isolates/taxa are represented by a single sequence. When multiple loci are sequenced these sequences may be concatenated to produce one tree that includes information from all loci. Alignment-based phylogenetic techniques are robust and widely used yet possess some shortcomings, including how heterozygous sites are handled, intolerance for missing data (i.e., partial genotypes), and differences in the way insertions-deletions (indels) are scored/treated during tree construction. In certain contexts, 'haplotype-based' methods may represent a viable alternative to alignment-based techniques, as they do not possess the aforementioned limitations. This is namely because haplotype-based methods assess genetic similarity based on numbers of shared (i.e., intersecting) haplotypes as opposed to similarities in nucleotide composition observed in an MSA. For haplotype-based comparisons, choosing an appropriate distance statistic is fundamental, and several statistics are available to choose from. However, a comprehensive assessment of various available statistics for their ability to produce a robust haplotype-based phylogenetic reconstruction has not yet been performed. We evaluated seven distance statistics by applying them to extant MLST datasets from the gastrointestinal parasite Cyclospora cayetanensis and two species of pathogenic nematode of the genus Strongyloides. We compare the genetic relationships identified using each statistic to epidemiologic, geographic, and host metadata. We show that Barratt's heuristic definition of genetic distance was the most robust among the statistics evaluated. Consequently, it is proposed that Barratt's heuristic represents a useful approach for use in the context of challenging MLST datasets possessing features (i.e., high heterozygosity, partial genotypes, and indel or repeat-based polymorphisms) that confound or preclude the use of alignment-based methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10127246PMC
http://dx.doi.org/10.1016/j.ympev.2022.107608DOI Listing

Publication Analysis

Top Keywords

mlst datasets
12
alignment-based phylogenetic
12
genetic relationships
8
partial genotypes
8
barratt's heuristic
8
methods
5
mlst
5
alignment-based
5
evaluation distance
4
distance computation
4

Similar Publications

OS-SSVEP: One-shot SSVEP classification.

Neural Netw

December 2024

School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, 230026, China; Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, 215123, China; Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China; Key Laboratory of Intelligent Information Processing of the Chinese Academy of Sciences, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China. Electronic address:

It is extremely challenging to classify steady-state visual evoked potentials (SSVEPs) in scenarios characterized by a huge scarcity of calibration data where only one calibration trial is available for each stimulus target. To address this challenge, we introduce a novel approach named OS-SSVEP, which combines a dual domain cross-subject fusion network (CSDuDoFN) with the task-related and task-discriminant component analysis (TRCA and TDCA) based on data augmentation. The CSDuDoFN framework is designed to comprehensively transfer information from source subjects, while TRCA and TDCA are employed to exploit the information from the single available calibration trial of the target subject.

View Article and Find Full Text PDF

Contamination with food-borne pathogens, such as Listeria monocytogenes, remains a big concern for food safety. Hence, rigorous and continuous microbial surveillance is a standard procedure. At this point, however, the food industry and authorities only focus on detection of Listeria monocytogenes without characterization of individual strains into groups of more or less concern.

View Article and Find Full Text PDF

Genomic diversity and ecological distribution of marine phages.

Mar Life Sci Technol

May 2023

College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China.

Unlabelled: , with a ubiquitous distribution, is one of the most abundant marine bacterial genera. It is especially abundant in the deep sea and polar seas, where it has been found to have a broad metabolic capacity and unique co-existence strategies with other organisms. However, only a few phages have so far been isolated and investigated and their genomic diversity and distribution patterns are still unclear.

View Article and Find Full Text PDF

Deciphering microeukaryotic-bacterial co-occurrence networks in coastal aquaculture ponds.

Mar Life Sci Technol

February 2023

Environmental Microbiomics Research Center, School of Environmental Science and Engineering, State Key Laboratory for Biocontrol, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Sun Yat-Sen University, Guangzhou, 510006 China.

Unlabelled: Microeukaryotes and bacteria are key drivers of primary productivity and nutrient cycling in aquaculture ecosystems. Although their diversity and composition have been widely investigated in aquaculture systems, the co-occurrence bipartite network between microeukaryotes and bacteria remains poorly understood. This study used the bipartite network analysis of high-throughput sequencing datasets to detect the co-occurrence relationships between microeukaryotes and bacteria in water and sediment from coastal aquaculture ponds.

View Article and Find Full Text PDF

In this study we report the whole genome sequencing (WGS) based analysis of blood-borne Campylobacter fetus subsp. fetus MMM01 isolated from a diabetic patient to obtain deeper insights in to the virulence and host adaptability. The sequenced genome of C.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!