Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms.

BMC Bioinformatics

South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.

Published: February 2015

Background: De novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies.

Results: Here, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5' and 3') regions and non-coding gene loci.

Conclusions: IFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4344733PMC
http://dx.doi.org/10.1186/s12859-015-0492-5DOI Listing

Publication Analysis

Top Keywords

bona fide
16
fide transfrags
16
transfrags
10
transcriptome assembly
8
post-assembly processing
8
functionally relevant
8
relevant assembly-derived
8
assembly-derived transcripts
8
coding potential
8
ifrat selects
8

Similar Publications

Antibodies to the RBD of SARS-CoV-2 spike mediate productive infection of primary human macrophages.

Nat Commun

December 2024

Department of Infectious Diseases, School of Immunology & Microbial Sciences, King's College London, London, SE1 9RT, UK.

The role of myeloid cells in the pathogenesis of SARS-CoV-2 is well established, in particular as drivers of cytokine production and systemic inflammation characteristic of severe COVID-19. However, the potential for myeloid cells to act as bona fide targets of productive SARS-CoV-2 infection, and the specifics of entry, remain unclear. Using a panel of anti-SARS-CoV-2 monoclonal antibodies (mAbs) we performed a detailed assessment of antibody-mediated infection of monocytes/macrophages.

View Article and Find Full Text PDF

Background: The proliferation capacity of adult cardiomyocytes is very limited in the normal adult mammalian heart. Previous studies implied that cardiomyocyte proliferation increases after injury stimulation, but the result is controversial partly due to different methodologies. We aim to evaluate whether myocardial infarction (MI) stimulates cardiomyocyte proliferation in adult mice.

View Article and Find Full Text PDF

Genomic analysis of isolated from surface water and animal sources in Chile reveals new T6SS effector protein candidates.

Front Microbiol

December 2024

Núcleo de Investigación en One Health, Facultad de Medicina Veterinaria y Agronomía, Universidad de Las Américas, Santiago, Chile.

Type VI Secretion Systems (T6SS), widely distributed in Gram-negative bacteria, contribute to interbacterial competition and pathogenesis through the translocation of effector proteins to target cells. harbor 5 pathogenicity islands encoding T6SS (SPI-6, SPI-19, SPI-20, SPI-21 and SPI-22), in which a limited number of effector proteins have been identified. Previous analyses by our group focused on the identification of candidate T6SS effectors and cognate immunity proteins in genomes deposited in public databases.

View Article and Find Full Text PDF

Assessing the impact of conformational perturbants on folding and aggregation pathways of a β-barrel fold.

Biochem Biophys Res Commun

December 2024

Department of Biological Chemistry, School of Pharmacy and Biochemistry, University of Buenos Aires and Institute of Chemistry and Biological Physical Chemistry (IQUIFIB, UBA-CONICET), Junin 956, 1113, Buenos Aires, Argentina. Electronic address:

Here we explore the interplay between physical and chemical perturbants to unravel links among native folding, amorphous and ordered aggregation scenarios in IFABP (rat intestinal fatty acid binding protein). This small beta-barrel protein undergoes amyloid-like aggregation above 15 % v/v trifluoroethanol. Our aim was to address the influence of sub-aggregating TFE concentrations on the unfolding transitions of IFABP.

View Article and Find Full Text PDF

Improving PD-1 blockade plus chemotherapy for complete remission of lung cancer by nanoPDLIM2.

Elife

December 2024

UPMC Hillman Cancer Center, Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine, Pittsburgh, United States.

Immune checkpoint inhibitors (ICIs) and their combination with other therapies such as chemotherapy, fail in most cancer patients. We previously identified the PDZ-LIM domain-containing protein 2 (PDLIM2) as a bona fide tumor suppressor that is repressed in lung cancer to drive cancer and its chemo and immunotherapy resistance, suggesting a new target for lung cancer therapy improvement. In this study, human clinical samples and data were used to investigate genetic and epigenetic changes in lung cancer.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!