Parallel short sequence assembly of transcriptomes.

BMC Bioinformatics

Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011, USA.

Published: January 2009

Background: The de novo assembly of genomes and transcriptomes from short sequences is a challenging problem. Because of the high coverage needed to assemble short sequences as well as the overhead of modeling the assembly problem as a graph problem, the methods for short sequence assembly are often validated using data from BACs or small sized prokaryotic genomes.

Results: We present a parallel method for transcriptome assembly from large short sequence data sets. Our solution uses a rigorous graph theoretic framework and tames the computational and space complexity using parallel computers. First, we construct a distributed bidirected graph that captures overlap information. Next, we compact all chains in this graph to determine long unique contigs using undirected parallel list ranking, a problem for which we present an algorithm. Finally, we process this compacted distributed graph to resolve unique regions that are separated by repeats, exploiting the naturally occurring coverage variations arising from differential expression.

Conclusion: We demonstrate the validity of our method using a synthetic high coverage data set generated from the predicted coding regions of Zea mays. We assemble 925 million sequences consisting of 40 billion nucleotides in a few minutes on a 1024 processor Blue Gene/L. Our method is the first fully distributed method for assembling a non-hierarchical short sequence data set and can scale to large problem sizes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648799PMC
http://dx.doi.org/10.1186/1471-2105-10-S1-S14DOI Listing

Publication Analysis

Top Keywords

short sequence
16
sequence assembly
8
short sequences
8
high coverage
8
sequence data
8
data set
8
assembly
5
short
5
problem
5
graph
5

Similar Publications

Hepatitis B virus (HBV) infects cells by attaching to heparan sulfate proteoglycans (HSPG) and Na/taurocholate cotransporting polypeptide (NTCP). The endothelial lipase LIPG bridges HSPG and HBV, facilitating HBV attachment. From a randomized peptide expression library, we identified a short sequence binding to LIPG.

View Article and Find Full Text PDF

Chromosome-scale genome assembly of Korean goosegrass (Eleusine indica).

Sci Data

January 2025

Department of Crop Science, Chungnam National University, Daejeon, 34134, Republic of Korea.

Goosegrass, belonging to the genus Eleusine within the Chloridoideae subfamily, is often one of the problematic weeds with strong invasiveness, competing with crops for essential survival resources. Although a chromosome-level genome assembly of E. indica from China was published last year, the present research focuses on a population of E.

View Article and Find Full Text PDF

Food safety is one of the primary demands of modern society. Mycotoxins are toxic metabolites of food-contaminating fungi. Fungi enter the food chain by infecting crops and irreversibly contaminate them due to the structural stability of mycotoxins.

View Article and Find Full Text PDF

Aerial urination suggests undescribed sensory modality and social function in river dolphins.

Behav Processes

January 2025

CetAsia Research Group Ltd., Baysville, Ontario, Canada; Department of Biology, Trent University, Peterborough, Ontario, Canada.

Scent marking through urine spraying is known to aid mate selection, territory marking and chemical communication in terrestrial, but not in aquatic mammals. We quantify an unusual aerial urination behaviour in botos (Inia geoffrensis) and discuss its potential functions. Between 2014 and 2018, we conducted land-based behavioural surveys on wild botos in central Brazil, recording the sequence, duration and social context of aerial urination.

View Article and Find Full Text PDF

Transient protein-protein interactions play key roles in controlling dynamic cellular responses. Many examples involve globular protein domains that bind to peptide sequences known as Short Linear Motifs (SLiMs), which are enriched in intrinsically disordered regions of proteins. Here we describe a novel functional assay for measuring SLiM binding, called Systematic Intracellular Motif Binding Analysis (SIMBA).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!