TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data.

Source Code Biol Med

Department of Crop Sciences, University of Illinois at Urbana-Champaign, 1201 W. Gregory Drive, Urbana, IL 61802 USA.

Published: July 2016

Background: In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult.

Results: We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files.

Conclusions: TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4940913PMC
http://dx.doi.org/10.1186/s13029-016-0057-7DOI Listing

Publication Analysis

Top Keywords

read counts
12
gbs rad-seq
12
rad-seq data
8
existing pipelines
8
fastq files
8
tagdigger
4
tagdigger user-friendly
4
user-friendly extraction
4
read
4
extraction read
4

Similar Publications

Comprehensive genome-scale CRISPR knockout screening of CHO cells.

Sci Data

January 2025

Department of Molecular Science and Technology, Ajou University, Suwon, 16499, Republic of Korea.

Chinese hamster ovary (CHO) cells play a pivotal role in the production of recombinant therapeutics. In the present study, we conducted a genome-scale pooled CRISPR knockout (KO) screening using a virus-free, recombinase-mediated cassette exchange-based platform in CHO-K1 host and CHO-K1 derived recombinant cells. Genome-wide guide RNA (gRNA) amplicon sequencing data were generated from cell libraries, as well as short- and long-term KO libraries, and validated through phenotypic assessment and gRNA read count distribution.

View Article and Find Full Text PDF

Mixed nontuberculous mycobacteria in an immunocompromised patient with probable progressive multifocal leukoencephalopathy.

IJID Reg

March 2025

SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Human Genetics, Stellenbosch University, Cape Town, South Africa.

Objectives: Nontuberculous mycobacteria (NTM) are increasingly recognized opportunistic pathogens found ubiquitously in the environment. The presence of multiple NTM species at the site of disease complicates diagnosis and treatment.

Case And Management: A 40-year-old patient who tested positive for HIV, with an absolute clusters of differentiation 4+ T-cell count of 3 cells/µl and cryptococcaemia, presented with hemoptysis, productive cough, and weight loss.

View Article and Find Full Text PDF

Introduction: Micro ribonucleic acids (miRNAs) are small non-coding RNAs that modulate the expression of various genes. They have an important role in cancer pathogenesis. Differential expression of multiple miRNAs have been used as potential diagnostic and prognostic markers.

View Article and Find Full Text PDF

Background: The purpose of this study was to investigate the effect of word choice on the quality of narrative feedback in ophthalmology resident trainee assessments following the introduction of competency-based medical education at Queen's University.

Methods: Assessment data from July 2017-December 2020 were retrieved from Elentra (Integrated Teaching and Learning Platform) and anonymized. Written feedback was assigned a Quality of Assessment for Learning (QuAL) score out of five based on this previously validated rubric.

View Article and Find Full Text PDF

Plant cells have two major organelles with their own genomes: chloroplasts and mitochondria. While chloroplast genomes tend to be structurally conserved, the mitochondrial genomes of plants, which are much larger than those of animals, are characterized by complex structural variation. We introduce TIPPo, a user-friendly, reference-free assembly tool that uses PacBio high-fidelity long-read data and that does not rely on genomes from related species or nuclear genome information for the assembly of organellar genomes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!