BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing.

BMC Bioinformatics

DNA Sequencing and Genomics Laboratory, Institute of Biotechnology, FIN-00014 University of Helsinki, P.O.Box 56, Finland.

Published: July 2018

Background: Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection.

Results: We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel .

Conclusions: Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034344PMC
http://dx.doi.org/10.1186/s12859-018-2262-7DOI Listing

Publication Analysis

Top Keywords

barcode set
12
sequencing
10
selecting optimal
8
optimal barcode
8
set
8
high-throughput sequencing
8
sequencing platforms
8
sequencing errors
8
barcodes
8
set barcodes
8

Similar Publications

Aiming at the effects caused by stress and deformation on Micro-Electro-Mechanical System (MEMS) sensors, the stress distribution in the radiation area of the MEMS infrared light source is investigated, and by simulating and optimizing the thickness of the composite support film of the chip structure in COMSOL, a film layer thickness matching with lower stress and deformation for the MEMS infrared light source is derived. The utilization of the particle swarm algorithm and backpropagation neural network model allowed for the optimization of simulation data, enabling regression prediction over a broader range of thicknesses and providing a more precise depiction of the stress distribution trend. In addition, the specifications of the MEMS device help us to analyze the design of the support film thickness in the processing of the residual stress within the controllable range.

View Article and Find Full Text PDF

Active learning and Gaussian processes for development of dissolution models: An AI-based data-efficient approach.

J Control Release

January 2025

US Early Development Biopharmacy, Synthetics Platform, Sanofi, 350 Water St., Cambridge, MA 02141, USA. Electronic address:

In vitro dissolution testing plays a key role in controlling the quality and optimizing the formulation of solid dosage pharmaceutical products. Data-driven dissolution models can improve the efficiency of testing: their predictions can act as surrogates to physical experiments and help identify key material attributes / processing parameters that impact product dissolution. Reducing the data (size) requirements of developing such models would significantly improve the utility of dissolution models.

View Article and Find Full Text PDF

Applying AI to Structured Real-World Data for Pharmacovigilance Purposes: Scoping Review.

J Med Internet Res

December 2024

Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé - LIMICS, Inserm, Université Sorbonne Paris-Nord, Sorbonne Université, Paris, France.

Background: Artificial intelligence (AI) applied to real-world data (RWD; eg, electronic health care records) has been identified as a potentially promising technical paradigm for the pharmacovigilance field. There are several instances of AI approaches applied to RWD; however, most studies focus on unstructured RWD (conducting natural language processing on various data sources, eg, clinical notes, social media, and blogs). Hence, it is essential to investigate how AI is currently applied to structured RWD in pharmacovigilance and how new approaches could enrich the existing methodology.

View Article and Find Full Text PDF
Article Synopsis
  • Smart wearables are essential for health monitoring and assisting the elderly or individuals with disabilities, but current machine learning methods face high resource demands and limited scalability.
  • This research introduces a new behavior detection approach that combines multi-source sensing with logical reasoning, aiming to streamline the process of behavior recognition.
  • The developed system achieves over 90% accuracy in recognizing 11 daily activities while significantly reducing the need for extensive training data compared to traditional machine learning methods.
View Article and Find Full Text PDF

Optimization and Calibration of 384-well Kinetic Ca Mobilization Assays for the Human Transient Receptor Potential Cation Channels TRPM8, TRPV1, and TRPA1.

SLAS Discov

December 2024

Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; University of Pittsburgh Hillman Cancer Center, Pittsburgh, PA 15232, USA. Electronic address:

Development, optimization, and calibration of human transient receptor potential (TRP) channel Ca mobilization assays for TRPM8, TRPV1, and TRPA1 are described. Heterologous expression of hTRPM8 in HEK293T cells was required for anti-TRPM8 antibody staining and TRPM8 agonist induced Ca mobilization signals which were both used to optimize transfection efficiency. FLIPR Calcium 6 dye concentration, loading time, and TRPM8 transfected cell seeding density were optimized and a DMSO tolerance of ≤0.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!