Introduction: In the fight to limit the global spread of antibiotic resistance, computational challenges associated with sequencing technology can impact the accuracy of downstream analysis, including drug resistance identification, transmission, and genome resolution. About 10% of (MTB) genome is constituted by the PE/PPE family, a GC-rich repetitive genome region. Although sequencing using short read technology is widely used, it is well recognized its limit in the PE/PPE regions due to the unambiguously mapping process onto the reference genome. The aim of this study was to compare the performances of short-reads (SRS), long-reads (LRS) and hybrid-reads (HYBR) based analysis over different common investigative tasks: genome coverage estimation, variant calling and cluster analysis, drug resistance detection and de novo assembly.

Methods: For the study 13 model MTB clinical isolates were sequenced with both SRS and LRS. HYBR were produced correcting the long reads with the short reads. The fastq from the three approaches were then processed using a customized version of MTBseq for genome coverage estimation and variant calling and using two different assemblers for de novo assembly evaluation.

Results: Estimation of genome coverage performances showed lower 8X breadth coverage for SRS respect to LRS and HYBR: considering the PE/PPE genes, SRS showed low results for the PE_PGRS family, while obtained acceptable coverage in PE and PPE genes; LRS and HYBR reached optimal coverages in PE/PPE genes. For variant calling HYBR showed the highest resolution, detecting the highest percentage of uniquely identified mutations compared to LRS and SRS. All three approaches agreed on the identification of two major clusters, with HYBR identifying an higher number of SNPs between the two clusters. Comparing the quality of the assemblies, HYBR and LRS obtained better results than SRS.

Discussion: In conclusion, depending on the aim of the investigation, both SRS and LRS present complementary advantages and limitations implying that for a full resolution of MTB genomes, where all the mentioned analyses and both technologies are needed, the use of the HYBR approach represents a valid option and a well-rounded strategy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9932330PMC
http://dx.doi.org/10.3389/fmicb.2023.1104456DOI Listing

Publication Analysis

Top Keywords

genome coverage
12
variant calling
12
lrs hybr
12
genome
8
drug resistance
8
hybr
8
coverage estimation
8
estimation variant
8
srs lrs
8
three approaches
8

Similar Publications

We identified seven distinct coronaviruses (CoVs) in bats from Brazil, classified into 229E-related (Alpha-CoV), Nobecovirus, Sarbecovirus, and Merbecovirus (Beta-CoV), including one closely related to MERS-like CoV with 82.8% genome coverage. To accomplish this, we screened 423 oral and rectal swabs from 16 different bat species using molecular assays, RNA sequencing, and evolutionary analysis.

View Article and Find Full Text PDF

Complete genome sequence of Pseudarthrobacter sp. NIBRBAC000502770 from coal mine of Hongcheon on Republic of Korea.

BMC Genom Data

January 2025

Department of Applied Biosciences, College of Agriculture and Life Sciences, Kyungpook National University, Daegu, 41566, Republic of Korea.

Objectives: The data were collected to obtain the complete genome sequence of Pseudarthrobacter sp. NIBRBAC000502770, isolated from the rhizosphere of Sasamorpha in a heavy metal-contaminated coal mine in Hongcheon, Republic of Korea. The objective was to explore the strain's genetic potential for plant growth promotion and heavy metal resistance, particularly arsenate and copper.

View Article and Find Full Text PDF

DNA methylation (DNAm) is a key epigenetic mark that shows profound alterations in cancer. Read-level methylomes enable more in-depth analyses, due to their broad genomic coverage and preservation of rare cell-type signals, compared to summarized data such as 450K/EPIC microarrays. Here, we propose MethylBERT, a Transformer-based model for read-level methylation pattern classification.

View Article and Find Full Text PDF

Remnant populations of endangered species often have complex demographic histories associated with human impact. This can present challenges for conservation as populations modified by human activity may require bespoke management. The Eurasian red squirrel, (L.

View Article and Find Full Text PDF

Objectives: To explore the landscape of BRCA1/2 mutations in gastric cancer patients.

Methods: Next-generation sequencing (NGS), Sanger sequencing, reverse transcription quantitative polymerase chain reaction (RT-qPCR), Immunohistochemistry, The Cancer Genome Atlas (TCGA), gnomAD, and DAVID.

Results: With 95% of bases boasting a phred score surpassing 30 and a minimum coverage depth of 500X, our NGS approach ensures high-quality data acquisition.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!