Background: The de novo assembly of genomes and transcriptomes from short sequences is a challenging problem. Because of the high coverage needed to assemble short sequences as well as the overhead of modeling the assembly problem as a graph problem, the methods for short sequence assembly are often validated using data from BACs or small sized prokaryotic genomes.
Results: We present a parallel method for transcriptome assembly from large short sequence data sets. Our solution uses a rigorous graph theoretic framework and tames the computational and space complexity using parallel computers. First, we construct a distributed bidirected graph that captures overlap information. Next, we compact all chains in this graph to determine long unique contigs using undirected parallel list ranking, a problem for which we present an algorithm. Finally, we process this compacted distributed graph to resolve unique regions that are separated by repeats, exploiting the naturally occurring coverage variations arising from differential expression.
Conclusion: We demonstrate the validity of our method using a synthetic high coverage data set generated from the predicted coding regions of Zea mays. We assemble 925 million sequences consisting of 40 billion nucleotides in a few minutes on a 1024 processor Blue Gene/L. Our method is the first fully distributed method for assembling a non-hierarchical short sequence data set and can scale to large problem sizes.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648799 | PMC |
http://dx.doi.org/10.1186/1471-2105-10-S1-S14 | DOI Listing |
FEBS Lett
January 2025
Research Department, Purotech Bio Inc, Yokohama, Japan.
Hepatitis B virus (HBV) infects cells by attaching to heparan sulfate proteoglycans (HSPG) and Na/taurocholate cotransporting polypeptide (NTCP). The endothelial lipase LIPG bridges HSPG and HBV, facilitating HBV attachment. From a randomized peptide expression library, we identified a short sequence binding to LIPG.
View Article and Find Full Text PDFSci Data
January 2025
Department of Crop Science, Chungnam National University, Daejeon, 34134, Republic of Korea.
Goosegrass, belonging to the genus Eleusine within the Chloridoideae subfamily, is often one of the problematic weeds with strong invasiveness, competing with crops for essential survival resources. Although a chromosome-level genome assembly of E. indica from China was published last year, the present research focuses on a population of E.
View Article and Find Full Text PDFBiochemistry (Mosc)
December 2024
Faculty of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia.
Food safety is one of the primary demands of modern society. Mycotoxins are toxic metabolites of food-contaminating fungi. Fungi enter the food chain by infecting crops and irreversibly contaminate them due to the structural stability of mycotoxins.
View Article and Find Full Text PDFBehav Processes
January 2025
CetAsia Research Group Ltd., Baysville, Ontario, Canada; Department of Biology, Trent University, Peterborough, Ontario, Canada.
Scent marking through urine spraying is known to aid mate selection, territory marking and chemical communication in terrestrial, but not in aquatic mammals. We quantify an unusual aerial urination behaviour in botos (Inia geoffrensis) and discuss its potential functions. Between 2014 and 2018, we conducted land-based behavioural surveys on wild botos in central Brazil, recording the sequence, duration and social context of aerial urination.
View Article and Find Full Text PDFJ Biol Chem
January 2025
Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA. Electronic address:
Transient protein-protein interactions play key roles in controlling dynamic cellular responses. Many examples involve globular protein domains that bind to peptide sequences known as Short Linear Motifs (SLiMs), which are enriched in intrinsically disordered regions of proteins. Here we describe a novel functional assay for measuring SLiM binding, called Systematic Intracellular Motif Binding Analysis (SIMBA).
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!