TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data.

Source Code Biol Med

Department of Crop Sciences, University of Illinois at Urbana-Champaign, 1201 W. Gregory Drive, Urbana, IL 61802 USA.

Published: July 2016

Background: In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult.

Results: We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files.

Conclusions: TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4940913	PMC
http://dx.doi.org/10.1186/s13029-016-0057-7	DOI Listing

Publication Analysis

Top Keywords

read counts

gbs rad-seq

rad-seq data

existing pipelines

fastq files

tagdigger

tagdigger user-friendly

user-friendly extraction

read

extraction read

Similar Publications

Comprehensive genome-scale CRISPR knockout screening of CHO cells.

Sci Data

January 2025

Department of Molecular Science and Technology, Ajou University, Suwon, 16499, Republic of Korea.

Sung Wook Shin Su Hyun Kim Aghiles Gasselin Gyun Min Lee Jae Seong Lee

Chinese hamster ovary (CHO) cells play a pivotal role in the production of recombinant therapeutics. In the present study, we conducted a genome-scale pooled CRISPR knockout (KO) screening using a virus-free, recombinase-mediated cassette exchange-based platform in CHO-K1 host and CHO-K1 derived recombinant cells. Genome-wide guide RNA (gRNA) amplicon sequencing data were generated from cell libraries, as well as short- and long-term KO libraries, and validated through phenotypic assessment and gRNA read count distribution.

View Article and Find Full Text PDF

Similar Publications

Mixed nontuberculous mycobacteria in an immunocompromised patient with probable progressive multifocal leukoencephalopathy.

IJID Reg

March 2025

SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Human Genetics, Stellenbosch University, Cape Town, South Africa.

Christoffel Opperman Janet Scott Janre Steyn Sarishna Singh Yonas Ghebrekristos

Objectives: Nontuberculous mycobacteria (NTM) are increasingly recognized opportunistic pathogens found ubiquitously in the environment. The presence of multiple NTM species at the site of disease complicates diagnosis and treatment.

Case And Management: A 40-year-old patient who tested positive for HIV, with an absolute clusters of differentiation 4+ T-cell count of 3 cells/µl and cryptococcaemia, presented with hemoptysis, productive cough, and weight loss.

View Article and Find Full Text PDF

Similar Publications

micro-RNA 451-a as a Circulating Biomarker for Neuroblastoma.

Microrna

January 2025

Department of Pathology, All India Institute of Medical Sciences, New Delhi, India.

Aditya Kumar Gupta Aijaz Ahmad Disha Kakker Jagdish Prasad Meena Ravi Kumar Majhi

Introduction: Micro ribonucleic acids (miRNAs) are small non-coding RNAs that modulate the expression of various genes. They have an important role in cancer pathogenesis. Differential expression of multiple miRNAs have been used as potential diagnostic and prognostic markers.

View Article and Find Full Text PDF

Similar Publications

Use the right words: evaluating the effect of word choice and word count on quality of narrative feedback in ophthalmology competency-based medical education assessments.

Can Med Educ J

December 2024

Department of Ophthalmology, Queen's University, Ontario, Canada.

Rachel Curtis Christine C Moon Tessa Hanmore Wilma M Hopman Stephanie Baxter

Background: The purpose of this study was to investigate the effect of word choice on the quality of narrative feedback in ophthalmology resident trainee assessments following the introduction of competency-based medical education at Queen's University.

Methods: Assessment data from July 2017-December 2020 were retrieved from Elentra (Integrated Teaching and Learning Platform) and anonymized. Written feedback was assigned a Quality of Assessment for Learning (QuAL) score out of five based on this previously validated rubric.

View Article and Find Full Text PDF

Similar Publications

TIPPo: A User-Friendly Tool for De Novo Assembly of Organellar Genomes with High-Fidelity Data.

Mol Biol Evol

January 2025

Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany.

Wenfei Xian Ilja Bezrukov Zhigui Bao Sebastian Vorbrugg Anupam Gautam

Plant cells have two major organelles with their own genomes: chloroplasts and mitochondria. While chloroplast genomes tend to be structurally conserved, the mitochondrial genomes of plants, which are much larger than those of animals, are characterized by complex structural variation. We introduce TIPPo, a user-friendly, reference-free assembly tool that uses PacBio high-fidelity long-read data and that does not rely on genomes from related species or nuclear genome information for the assembly of organellar genomes.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!