TranSeqAnnotator: large-scale analysis of transcriptomic data.

BMC Bioinformatics

Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence, Macquarie University, Sydney, NSW 2109, Australia.

Published: May 2013

AI Article Synopsis

  • The transcriptome can be analyzed using expressed sequence tag (EST) data sets, which provide a quick and cost-effective method for understanding an organism's genome and proteome, necessitating automated tools for efficient data handling.
  • TranSeqAnnotator is a bioinformatics workflow designed for large-scale transcriptomic data analysis that automates processes such as cleaning, clustering, and annotation of sequences, and can identify excretory/secretory proteins.
  • It is developed for Linux clusters, outputs extensive functional and ontological information, and helps identify potential therapeutic targets, with free access provided to the scientific community.

Article Abstract

Background: The transcriptome of an organism can be studied with the analysis of expressed sequence tag (EST) data sets that offers a rapid and cost effective approach with several new and updated bioinformatics approaches and tools for assembly and annotation. The comprehensive analyses comprehend an organism along with the genome and proteome analysis. With the advent of large-scale sequencing projects and generation of sequence data at protein and cDNA levels, automated analysis pipeline is necessary to store, organize and annotate ESTs.

Results: TranSeqAnnotator is a workflow for large-scale analysis of transcriptomic data with the most appropriate bioinformatics tools for data management and analysis. The pipeline automatically cleans, clusters, assembles and generates consensus sequences, conceptually translates these into possible protein products and assigns putative function based on various DNA and protein similarity searches. Excretory/secretory (ES) proteins inferred from ESTs/short reads are also identified. The TranSeqAnnotator accepts FASTA format raw and quality ESTs along with protein and short read sequences and are analysed with user selected programs. After pre-processing and assembly, the dataset is annotated at the nucleotide, protein and ES protein levels.

Conclusion: TranSeqAnnotator has been developed in a Linux cluster, to perform an exhaustive and reliable analysis and provide detailed annotation. TranSeqAnnotator outputs gene ontologies, protein functional identifications in terms of mapping to protein domains and metabolic pathways. The pipeline is applied to annotate large EST datasets to identify several novel and known genes with therapeutic experimental validations and could serve as potential targets for parasite intervention. TransSeqAnnotator is freely available for the scientific community at http://estexplorer.biolinfo.org/TranSeqAnnotator/.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521237PMC
http://dx.doi.org/10.1186/1471-2105-13-S17-S24DOI Listing

Publication Analysis

Top Keywords

large-scale analysis
8
analysis transcriptomic
8
transcriptomic data
8
protein
8
analysis pipeline
8
analysis
7
transeqannotator
5
data
5
transeqannotator large-scale
4
data background
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!