Reference-free transcriptome assembly in non-model animals from next-generation sequencing data.

Mol Ecol Resour

CNRS UMR 5554, Institut des Sciences de l'Evolution de Montpellier, Université Montpellier 2, Place E. Bataillon, 34095 Montpellier, France.

Published: September 2012

Next-generation sequencing (NGS) technologies offer the opportunity for population genomic study of non-model organisms sampled in the wild. The transcriptome is a convenient and popular target for such purposes. However, designing genetic markers from NGS transcriptome data requires assembling gene-coding sequences out of short reads. This is a complex task owing to gene duplications, genetic polymorphism, alternative splicing and transcription noise. Typical assembling programmes return thousands of predicted contigs, whose connection to the species true gene content is unclear, and from which SNP definition is uneasy. Here, the transcriptomes of five diverse non-model animal species (hare, turtle, ant, oyster and tunicate) were assembled from newly generated 454 and Illumina sequence reads. In two species for which a reference genome is available, a new procedure was introduced to annotate each predicted contig as either a full-length cDNA, fragment, chimera, allele, paralogue, genomic sequence or other, based on the number of, and overlap between, blast hits to the appropriate reference. Analyses showed that (i) the highest quality assemblies are obtained when 454 and Illumina data are combined, (ii) typical de novo assemblies include a majority of irrelevant cDNA predictions and (iii) assemblies can be appropriately cleaned by filtering contigs based on length and coverage. We conclude that robust, reference-free assembly of thousands of genes from transcriptomic NGS data is possible, opening promising perspectives for transcriptome-based population genomics in animals. A Galaxy pipeline implementing our best-performing assembling strategy is provided.

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1755-0998.2012.03148.xDOI Listing

Publication Analysis

Top Keywords

next-generation sequencing
8
454 illumina
8
reference-free transcriptome
4
transcriptome assembly
4
assembly non-model
4
non-model animals
4
animals next-generation
4
data
4
sequencing data
4
data next-generation
4

Similar Publications

Progress on ancient DNA investigation of Late Quaternary mammals in China.

Yi Chuan

January 2025

State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan 430078, China.

It has been more than 40 years since the beginning of exploring the genetic composition of ancient organisms from the perspective of ancient DNA. In the recent 20 years, with the development and application of high-throughput sequencing technology platforms and the improved efficiency of retrieving highly fragmented DNA molecules, ancient DNA research moved forward to a brand-new era of deep-time paleogenomics. It not only solved many controversial phylogenetic problems, enriched the migration and evolution details of various organisms including humans, but also launched exploration of the molecular responses to climate changes in terms of "whole genomic-big data-multi-species" level.

View Article and Find Full Text PDF

The reduced cost of next-generation sequencing (NGS) has allowed researchers to generate nuclear and mitochondrial genome data to gain deeper insights into the phylogeography, evolutionary history and biology of non-model species. While the Cape buffalo () has been well-studied across its range with traditional genetic markers over the last 25 years, researchers are building on this knowledge by generating whole genome, population-level data sets to improve understanding of the genetic composition and evolutionary history of the species. Using publicly available NGS data, we assembled 40 Cape buffalo mitochondrial genomes (mitogenomes) from four protected areas in South Africa, expanding the geographical range and almost doubling the number of mitogenomes available for this species.

View Article and Find Full Text PDF

Background: Early diagnosis of systemic light-chain amyloidosis (AL) is needed because 25% of patients die within months of diagnosis. In patients with monoclonal gammopathy of undetermined significance (MGUS) or smoldering multiple myeloma (SMM) of the λ isotype, we explored the use of 2 screening variables: a free light chain difference of 23mg/L between λ and k and presence of IGLV genes that occur more frequently in AL.

Methods: Patients contacted us and we sent HIPAA release and consent forms for discussion by phone.

View Article and Find Full Text PDF

The new HLA-B*35:01:80 allele showed one synonymous nucleotide difference compared to the HLA-B*35:01:01:01 allele in codon 137.

View Article and Find Full Text PDF

HLA-DRB1*08:130 shows a Leucine at position 64 not described previously.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!