Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements.

PLoS One

Department of Biology, Stanford University, Stanford, California, United States of America; Institut des Sciences de l'Evolution-Montpellier, Montpellier, France.

Published: May 2015

High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5-18.5 Kbp with an extremely low error rate ([Formula: see text]0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4154752PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0106689PLOS

Publication Analysis

Top Keywords

truseq synthetic
16
synthetic long-reads
12
novo assembly
12
transposable elements
12
reference genome
12
current reference
8
genome
6
assembly
5
illumina truseq
4
synthetic
4

Similar Publications

Messenger RNA capture sequencing of extracellular RNA from human biofluids using a comprehensive set of spike-in controls.

STAR Protoc

June 2021

Center for Medical Genetics, Department of Biomolecular Medicine, OncoRNALab, Ghent University, C. Heymanslaan 10, 9000 Ghent, Belgium.

Comprehensive transcriptome analysis of extracellular RNA (exRNA) purified from human biofluids is challenging because of the low RNA concentration and compromised RNA integrity. Here, we describe an optimized workflow to (1) isolate exRNA from different types of biofluids and (2) to prepare messenger RNA (mRNA)-enriched sequencing libraries using complementary hybridization probes. Importantly, the workflow includes 2 sets of synthetic spike-in RNA molecules as processing controls for RNA purification and sequencing library preparation and as an alternative data normalization strategy.

View Article and Find Full Text PDF

Using FFPE RNA-Seq with 12 marker genes to evaluate genotoxic and non-genotoxic rat hepatocarcinogens.

Genes Environ

March 2020

1Division of Molecular Target and Gene Therapy Products, National Institute of Health Sciences, 3-25-26, Tonomachi, Kawasaki-ku, 210-9501 Japan.

Introduction: Various challenges have been overcome with regard to applying 'omics technologies for chemical risk assessments. Previously we published results detailing targeted mRNA sequencing (RNA-Seq) on a next generation sequencer using intact RNA derived from freshly frozen rat liver tissues. We successfully discriminated genotoxic hepatocarcinogens (GTHCs) from non-genotoxic hepatocarcinogens (NGTHCs) using 11 selected marker genes.

View Article and Find Full Text PDF

Recent advances in long fragment read (LFR, also known as linked-read technologies or read-cloud) technologies, such as single tube long fragment reads (stLFR), 10X Genomics Chromium reads, and TruSeq synthetic long-reads, have enabled efficient haplotyping and genome assembly. However, in the case of stLFR and 10X Genomics Chromium reads, the long fragments of a genome are covered sparsely by reads in each barcode and most barcodes are contained in multiple long fragments from different regions, which results in inefficient assembly when using long-range information. Thus, methods to address these shortcomings are vital for capitalizing on the additional information obtained using these technologies.

View Article and Find Full Text PDF

Extracellular vesicles (EVs) have great potential as a source for clinically relevant biomarkers since they can be readily isolated from biofluids and carry microRNA (miRNA), mRNA, and proteins that can reflect disease status. However, the biological and technical variability of EV content is unknown making comparisons between healthy subjects and patients difficult to interpret. In this study, we sought to establish a laboratory and bioinformatics analysis pipeline to analyse the small RNA content within EVs from patient serum that could serve as biomarkers and to assess the biological and technical variability of EV RNA content in healthy individuals.

View Article and Find Full Text PDF

The sugarcane mitochondrial genome: assembly, phylogenetics and transcriptomics.

PeerJ

September 2019

Computational, Evolutionary and Systems Biology Laboratory, Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil.

Background: Chloroplast genomes provide insufficient phylogenetic information to distinguish between closely related sugarcane cultivars, due to the recent origin of many cultivars and the conserved sequence of the chloroplast. In comparison, the mitochondrial genome of plants is much larger and more plastic and could contain increased phylogenetic signals. We assembled a consensus reference mitochondrion with Illumina TruSeq synthetic long reads and Oxford Nanopore Technologies MinION long reads.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!