The Sequence Read Archive: explosive growth of sequencing data.

Nucleic Acids Res

Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Yata, Mishima 411-8540, Japan.

Published: January 2012

New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ncbi.nlm.nih.gov/sra from NCBI, at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245110PMC
http://dx.doi.org/10.1093/nar/gkr854DOI Listing

Publication Analysis

Top Keywords

sequence read
8
read archive
8
sequencing platforms
8
next-generation sequence
8
sequence data
8
data
7
sequence
5
archive explosive
4
explosive growth
4
sequencing
4

Similar Publications

Helminths infection of Schizothorax niger in Kashmir, India: morphological and molecular characterization.

Mol Biol Rep

January 2025

Division of Animal Biotechnology, Faculty of Veterinary Sciences & Animal Husbandry, SKUAST-K, Srinagar, India.

Background: The identification of helminth parasites in Schizothorax spp. from Kashmir, including Schyzocotyle acheilognathi, Pomphorhynchus kashmirensis, and Adenoscolex oreini, is hindered by morphological limitations and high intraspecific variation. While previous studies have relied on morphological diagnosis, a comprehensive molecular characterization is lacking.

View Article and Find Full Text PDF

Variant calling using long-read RNA sequencing (lrRNA-seq) can be applied to diverse tasks, such as capturing full-length isoforms and gene expression profiling. It poses challenges, however, due to higher error rates than DNA data, the complexities of transcript diversity, RNA editing events, etc. In this paper, we propose Clair3-RNA, the first deep learning-based variant caller tailored for lrRNA-seq data.

View Article and Find Full Text PDF

Introduction: Structural variants (SVs) of the nebulin gene ( ), including intragenic duplications, deletions, and copy number variation of the triplicate region, are an established cause of recessively inherited nemaline myopathies and related neuromuscular disorders. Large deletions have been shown to cause dominantly inherited distal myopathies. Here we provide an overview of 35 families with muscle disorders caused by such SVs in .

View Article and Find Full Text PDF

Gene fusions are common primary drivers of pediatric leukemias and are the result of underlying structural variant (SVs). Current clinical workflows to detect such alterations rely on a multimodal approach, which often increases analysis time and overall cost of testing. In this study, we used long-read sequencing (lrSeq) as a proof-of-concept to determine whether clinically relevant (cr) SVs could be detected within a small (n = 17) pediatric leukemia cohort.

View Article and Find Full Text PDF

Background And Aims: Familial hypercholesterolemia (FH) and other disorders with similar features are common genetic disorders that remain underdiagnosed and undertreated, due in part to the cost of screening. The aim of this study was to design and implement a whole gene targeted NGS panel for the molecular diagnosis of FH and statin intolerance with an emphasis on high quality variant calling, including copy number analysis.

Methods: A whole gene panel for hybridisation-based short read NGS was designed for the dominant FH-genes low density lipoprotein receptor (), apolipoprotein B (APOB), proproteinconvertas subtilisin/kexin type 9 (PCSK9), apolipoprotein E (APOE) and the recessive FH-genes low density lipoprotein receptor adaptor protein 1 (), ATP binding cassette subfamily member 5/8 (ABCG5/8) and lipase A, lysosomal acid type (), as well as solute carrier organic anion transporter family member 1B1 (), not an FH gene but linked to statin intolerance.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!