Background: New high throughput pyrosequencers such as the 454 Life Sciences GS 20 are capable of massively parallelizing DNA sequencing providing an unprecedented rate of output data as well as potentially reducing costs. However, these new pyrosequencers bear a different error profile and provide shorter reads than those of a more traditional Sanger sequencer. These facts pose new challenges regarding how the data are handled and analyzed, in addition, the steep increase in the sequencers throughput calls for much computation power at a low cost.

Results: To address these challenges, we created an automated multi-step computation pipeline integrated with a database storage system. This allowed us to store, handle, index and search (1) the output data from the GS20 sequencer (2) analysis projects, possibly multiple on every dataset (3) final results of analysis computations (4) intermediate results of computations (these allow hand-made comparisons and hence further searches by the biologists). Repeatability of computations was also a requirement. In order to access the needed computation power, we ported the pipeline to the European Grid: a large community of clusters, load balanced as a whole. In order to better achieve this Grid port we created Vnas: an innovative Grid job submission, virtual sandbox manager and job callback framework. After some runs of the pipeline aimed at tuning the parameters and thresholds for optimal results, we successfully analyzed 273 sequenced amplicons from a cancerous human sample and correctly found punctual mutations confirmed by either Sanger resequencing or NCBI dbSNP. The sequencing was performed with our 454 Life Sciences GS 20 pyrosequencer.

Conclusion: We handled the steep increase in throughput from the new pyrosequencer by building an automated computation pipeline associated with database storage, and by leveraging the computing power of the European Grid. The Grid platform offers a very cost effective choice for uneven workloads, typical in many scientific research fields, provided its peculiarities can be accepted (these are discussed). The mentioned infrastructure was used to analyze human amplicons for mutations. More analyses will be performed in the future.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1885852PMC
http://dx.doi.org/10.1186/1471-2105-8-S1-S22DOI Listing

Publication Analysis

Top Keywords

high throughput
8
throughput pyrosequencers
8
454 life
8
life sciences
8
output data
8
steep increase
8
computation power
8
computation pipeline
8
database storage
8
european grid
8

Similar Publications

Motivation: Recent advancements in parallel sequencing methods have precipitated a surge in publicly available short-read sequence data. This has encouraged the development of novel computational tools for the de novo assembly of transcriptomes from RNA-seq data. Despite the availability of these tools, performing an end-to-end transcriptome assembly remains a programmatically involved task necessitating familiarity with best practices.

View Article and Find Full Text PDF

The regulation of cell physiology depends largely upon interactions of functionally distinct proteins and cellular components. These interactions may be transient or long-lived, but often affect protein motion. Measurement of protein dynamics within a cellular environment, particularly while perturbing protein function with small molecules, may enable dissection of key interactions and facilitate drug discovery; however, current approaches are limited by throughput with respect to data acquisition and analysis.

View Article and Find Full Text PDF

Purpose: To extract conjunctival bulbar redness from standardized high-resolution ocular surface photographs of a novel imaging system by implementing an image analysis pipeline.

Methods: Data from two trials (healthy; outgoing ophthalmic clinic) were collected, processed, and used to train a machine learning model for ocular surface segmentation. Various regions of interest were defined to globally and locally extract a redness biomarker based on color intensity.

View Article and Find Full Text PDF

A new capulavirus infecting sugar beet (Beta vulgaris L.) in France.

Arch Virol

January 2025

Univ. Bordeaux, INRAE, UMR 1332 Biologie du Fruit et Pathologie, CS20032, 33882, Villenave d'Ornon Cedex, France.

A novel capulavirus was identified by high-throughput sequencing in four sugar beet (Beta vulgaris L.) plants collected in April 2023 in Normandy (France). The complete genome of 2744 nucleotides (nt) was sequenced and found to have an organization similar to that of known capulaviruses, with which it showed close phylogenetic relationships.

View Article and Find Full Text PDF

Purpose: Malignant peripheral nerve sheath tumor (MPNST) is an aggressive soft tissue sarcoma that develops sporadically or in Neurofibromatosis type 1 patients. Its development is marked by the inactivation of specific tumor suppressor genes (TSGs): NF1, CDKN2A and SUZ12EED (Polycomb Repressor Complex 2). Each TSG loss can be targeted by particular drug inhibitors and we aimed to systematically combine these inhibitors, guided by TSG inactivation status, to test their precision medicine potential for MPNSTs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!