Geoseq: a tool for dissecting deep-sequencing datasets.

James Gurtowski Anthony Cancio Hardik Shah Chaya Levovitz Ajish George Robert Homann Ravi Sachidanandam

BMC Bioinformatics

Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, 1425 Madison Avenue, New York, NY 10029, USA.

Published: October 2010

Background: Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest.

Results: Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment.

Conclusions: Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2972303	PMC
http://dx.doi.org/10.1186/1471-2105-11-506	DOI Listing

Publication Analysis

Top Keywords

deep-sequencing datasets

sequencing data

geoseq

datasets

data

geoseq tool

tool dissecting

dissecting deep-sequencing

datasets background

background datasets

Similar Publications

Meta-analysis of RNA-Seq datasets allows a better understanding of P. tricornutum cellular biology, a requirement to improve the production of Biologics.

Sci Rep

January 2025

University of Rouen Normandie, UNIROUEN, UFR des Sciences et Techniques, GlycoMEV UR4358, Innovation Chimie Carnot, Fédération de Recherche Normandie-Végétal FED 4277, 76821, Mont-Saint-Aignan, France.

Isabelle Boulogne Charlotte Toustou Muriel Bardor

The marine diatom Phaeodactylum tricornutum is currently used for various industrial applications, including the pharmaceutical industry as a cost-effective cell biofactory to produce Biologics. Recent studies demonstrated that P. tricornutum can produce functional monoclonal antibodies, such application is currently limited by the production yield that hinders industrialization.

View Article and Find Full Text PDF

Similar Publications

Improving the generalization of protein expression models with mechanistic sequence information.

Nucleic Acids Res

January 2025

School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JH, United Kingdom.

Yuxin Shen Grzegorz Kudla Diego A Oyarzún

The growing demand for biological products drives many efforts to maximize expression of heterologous proteins. Advances in high-throughput sequencing can produce data suitable for building sequence-to-expression models with machine learning. The most accurate models have been trained on one-hot encodings, a mechanism-agnostic representation of nucleotide sequences.

View Article and Find Full Text PDF

Similar Publications

Integrating representation learning, permutation, and optimization to detect lineage-related gene expression patterns.

Nat Commun

January 2025

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, USA.

Hannah M Schlüter Caroline Uhler

Recent barcoding technologies allow reconstructing lineage trees while capturing paired single-cell RNA-sequencing (scRNA-seq) data. Such datasets provide opportunities to compare gene expression memory maintenance through lineage branching and pinpoint critical genes in these processes. Here we develop Permutation, Optimization, and Representation learning based single Cell gene Expression and Lineage ANalysis (PORCELAN) to identify lineage-informative genes or subtrees where lineage and expression are tightly coupled.

View Article and Find Full Text PDF

Similar Publications

RNA Virus Discovery Sheds Light on the Virome of a Major Vineyard Pest, the European Grapevine Moth ().

Viruses

January 2025

Instituto de Patología Vegetal, Centro de Investigaciones Agropecuarias, Instituto Nacional de Tecnología Agropecuaria (IPAVE-CIAP-INTA), Camino 60 Cuadras Km 5,5, Córdoba X5020ICA, Argentina.

Humberto Debat Sebastian Gomez-Talquenca Nicolas Bejerman

The European grapevine moth () poses a significant threat to vineyards worldwide, causing extensive economic losses. While its ecological interactions and control strategies have been well studied, its associated viral diversity remains unexplored. Here, we employ high-throughput sequencing data mining to comprehensively characterize the virome, revealing novel and diverse RNA viruses.

View Article and Find Full Text PDF

Similar Publications

Joint analysis of germline genetic data from over 29,000 cases with suspected hereditary breast and ovarian cancer (HBOC) as part of the NASGE initiative.

Breast

January 2025

Medical Genetics Center (MGZ), Bayerstr. 3-5, 80335, Munich, Germany; NASGE, Nationale Allianz für seltene genetische Erkrankungen, Germany; Department of Medicine IV, Klinikum der Universität, Ludwig-Maximilians-Universität, Ziemssenstr. 5, 80336, Munich, Germany. Electronic address:

Jan Henkel Andreas Laner Melanie Locher Tobias Wohlfrom Birgit Neitzel

As multigene panel testing is becoming routine in clinical care, there are recommendations at national and international level, as to which genes should be analyzed in the context of a hereditary breast and ovarian cancer (HBOC). However, the individual composition of gene panels offered by testing laboratories vary, resulting in a different variant diagnostic rate. Therefore, we performed a retrospective NGS dataset analysis of suspected HBOC patients who had been tested at different German diagnostic laboratories that are part of the NASGE network.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!