The internal transcribed spacer region (ITS) of the nuclear rDNA cistron represents the barcoding locus for Fungi. Intragenomic variation of this multicopy gene can interfere with accurate phylogenetic reconstruction of biological entities. We investigated the amount and nature of this variation for the lichenized fungus Cora inversa in the Hygrophoraceae (Basidiomycota: Agaricales), analyzing base call and length variation in ITS1 454 pyrosequencing data of three samples of the target mycobiont, for a total of 16,665 reads obtained from three separate repeats of the same samples under different conditions. Using multiple fixed alignment methods (PaPaRa) and maximum likelihood phylogenetic analysis (RAxML), we assessed phylogenetic relationships of the obtained reads, together with Sanger ITS sequences from the same samples. Phylogenetic analysis showed that all ITS1 reads belonged to a single species, C. inversa. Pyrosequencing data showed 266 insertion sites in addition to the 325 sites expected from Sanger sequences, for a total of 15,654 insertions (0.94 insertions per read). An additional 3,279 substitutions relative to the Sanger sequences were detected in the dataset, out of 5,461,125 bases to be called. Up to 99.3% of the observed indels in the dataset could be interpreted as 454 pyrosequencing errors, approximately 65% corresponding to incorrectly recovered homopolymer segments, and 35% to carry-forward-incomplete-extension errors. Comparison of automated clustering and alignment-based phylogenetic analysis demonstrated that clustering of these reads produced a 35-fold overestimation of biological diversity in the dataset at the 95% similarity threshold level, whereas phylogenetic analysis using a maximum likelihood approach accurately recovered a single biological entity. We conclude that variation detected in 454 pyrosequencing data must be interpreted with great care and that a combination of a sufficiently large number of reads per taxon, a set of Sanger references for the same taxon, and at least two runs under different emulsion PCR and sequencing conditions, are necessary to reliably separate biological variation from 454 sequencing errors. Our study shows that clustering methods are highly sensitive to artifactual sequence variation and inadequate to properly recover biological diversity in a dataset, if sequencing errors are substantial and not removed prior to clustering analysis.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00239-013-9603-yDOI Listing

Publication Analysis

Top Keywords

phylogenetic analysis
16
454 pyrosequencing
12
pyrosequencing data
12
sanger sequences
12
cora inversa
8
inversa hygrophoraceae
8
maximum likelihood
8
biological diversity
8
diversity dataset
8
sequencing errors
8

Similar Publications

The Hepatincolaceae (Alphaproteobacteria) are a group of bacteria that inhabit the gut of arthropods and other ecdysozoans, associating extracellularly with microvilli. Previous phylogenetic studies, primarily single-gene analyses, suggested their relationship to the Holosporales, which includes intracellular bacteria in protist hosts. However, the genomics of Hepatincolaceae is still in its early stages.

View Article and Find Full Text PDF

Comparative Analysis of Protist Communities in Oilsands Tailings Using Amplicon Sequencing and Metagenomics.

Environ Microbiol

January 2025

Division of Infectious Diseases, Department of Medicine, and Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada.

The Canadian province of Alberta contains substantial oilsands reservoirs, consisting of bitumen, clay and sand. Extracting oil involves separating bitumen from inorganic particles using hot water and chemical diluents, resulting in liquid tailings waste with ecotoxicologically significant compounds. Ongoing efforts aim to reclaim tailings-affected areas, with protist colonisation serving as one assessment method of reclamation progress.

View Article and Find Full Text PDF

Aims: This study explores the link between body mass index (BMI), intestinal permeability, and associated changes in anthropometric and impedance parameters, lipid profiles, inflammatory markers, fecal metabolites, and gut microbiota taxa composition in participants having excessive body mass.

Methods: A cohort of 58 obese individuals with comparable diet, age, and height was divided into three groups based on a priori clustering analyses that fit with BMI class ranges: Group I (25-29.9), Group II (30-39.

View Article and Find Full Text PDF

Genome-Wide Identification and Functional Characterization of Gene Family Reveal Its Involvement in Response to Stress in Cotton.

Int J Mol Sci

January 2025

Institute of Cotton, Hebei Academy of Agriculture and Forestry Sciences/Key Laboratory of Cotton Biology and Genetic Breeding in Huanghuaihai Semiarid Area, Ministry of Agriculture and Rural Affairs, Shijiazhuang 050000, China.

SKP1 constitutes the Skp1-Cullin-F-box ubiquitin E3 ligase (SCF), which plays a role in plant growth and development and biotic and abiotic stress in ubiquitination. However, the response of the gene family to abiotic and biotic stresses in cotton has not been well characterized. In this study, a total of 72 genes with the conserved domain of SKP1 were identified in four Gossypium species.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!