Summary: CRISPR-Cas9 and shRNA high-throughput sequencing screens have abundant applications for basic and translational research. Methods and tools for the analysis of these screens must properly account for sequencing error, resolve ambiguous mappings among similar sequences in the barcode library in a statistically principled manner, and be computationally efficient. Herein we present bcSeq, an open source R package that implements a fast and parallelized algorithm for mapping high-throughput sequencing reads to a barcode library while tolerating sequencing error. The algorithm uses a Trie data structure for speed and resolves ambiguous mappings by using a statistical sequencing error model based on Phred scores for each read.

Availability And Implementation: The package source code and an accompanying tutorial are available at http://bioconductor.org/packages/bcSeq/.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6184561PMC
http://dx.doi.org/10.1093/bioinformatics/bty402DOI Listing

Publication Analysis

Top Keywords

sequencing error
12
mapping high-throughput
8
high-throughput sequencing
8
ambiguous mappings
8
barcode library
8
sequencing
5
bcseq package
4
package fast
4
fast sequence
4
sequence mapping
4

Similar Publications

Blood-based epigenome-wide association study and prediction of alcohol consumption.

Clin Epigenetics

January 2025

Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.

Alcohol consumption is an important risk factor for multiple diseases. It is typically assessed via self-report, which is open to measurement error through recall bias. Instead, molecular data such as blood-based DNA methylation (DNAm) could be used to derive a more objective measure of alcohol consumption by incorporating information from cytosine-phosphate-guanine (CpG) sites known to be linked to the trait.

View Article and Find Full Text PDF

Purpose: To improve the current method for MRI turbulence quantification which is the intravoxel phase dispersion (IVPD) method. Turbulence is commonly characterized by the Reynolds stress tensor (RST) which describes the velocity covariance matrix. A major source for systematic errors in MRI is the sequence's sensitivity to the variance of the derivatives of velocity, such as the acceleration variance, which can lead to a substantial measurement bias.

View Article and Find Full Text PDF

People with aphasia show stable Cumulative Semantic Interference (CSI) when tested repeatedly in a web-based paradigm: A perspective for longitudinal assessment.

Cortex

December 2024

Humboldt-Universität zu Berlin, Berlin School of Mind and Brain, Berlin, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Department of Neurology, Leipzig, Germany; University Hospital and Faculty of Medicine Leipzig, Clinic for Cognitive Neurology, Leipzig, Germany.

Retrieving words quickly and correctly is an important language competence. Semantic contexts, such as prior naming of categorically related objects, can induce conceptual priming but also lexical-semantic interference, the latter likely due to enhanced competition during lexical selection. In the continuous naming (CN) paradigm, such semantic interference is evident in a linear increase in naming latency with each additional member of a category out of a seemingly random sequence of pictures being named (cumulative semantic interference/CSI effect).

View Article and Find Full Text PDF

Genotyping Genebank Collections: Strategic Approaches and Considerations for Optimal Collection Management.

Plants (Basel)

January 2025

United States Department of Agriculture Agricultural Research Service Small Grains and Potato Germplasm Research, Aberdeen, ID 83210, USA.

The maintenance of plant germplasm and its genetic diversity is critical to preserving and making it available for food security, so this invaluable diversity is not permanently lost due to population growth and development, climate change, or changing needs from the growers and/or the marketplace. There are numerous genebanks worldwide that serve to preserve valuable plant germplasm for humankind's future and to serve as a resource for research, breeding, and training. The United States Department of Agriculture (USDA) National Plant Germplasm System (NPGS) and the Consultative Group for International Agricultural Research (CGIAR) both have a network of plant germplasm collections scattered across varying geographical locations preserving genetic resources for the future.

View Article and Find Full Text PDF

This study developed a scientific process parameter setup based on nozzle pressure and screw position, with the process parameter search sequence being injection speed, / switchover position, packing pressure, and packing time. Unlike previous studies, this study focuses on the scientific process parameter setup of experiments and simulations, as well as on the implementation of calibration. Experiments and simulations had the same trend of results in the scientific process parameter setup.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!