CAARS: comparative assembly and annotation of RNA-Seq data.

Bioinformatics

UnivLyon, Université Claude Bernard Lyon 1, ENS de Lyon, CNRS UMR, INSERM U1210, LBMC, F-69007, Lyon, France.

Published: July 2019

AI Article Synopsis

  • RNA sequencing (RNA-Seq) is commonly used for analyzing transcript sequences in non-model organisms, but existing bioinformatics methods don't fully leverage reference data from related species for better results.
  • The CAARS pipeline was developed to merge new RNA-Seq data with existing multi-species gene family data, improving assembly and annotation processes, especially in challenging cases like rodents and fishes.
  • CAARS, available for free on GitHub, offers enhanced RNA-Seq assembly accuracy and completeness compared to standard methods, alongside useful gene family alignments and phylogenetic information for comparative studies.

Article Abstract

Motivation: RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction.

Results: We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses.

Availability And Implementation: CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6596894PMC
http://dx.doi.org/10.1093/bioinformatics/bty903DOI Listing

Publication Analysis

Top Keywords

gene family
12
assembly annotation
8
rna-seq data
8
family alignments
8
alignments trees
8
caars
7
rna-seq
6
data
5
gene
5
caars comparative
4

Similar Publications

[Advances in the study of viruses inhibiting the production of advanced autophagy or interferon through Rubicon to achieve innate immune escape].

Xi Bao Yu Fen Zi Mian Yi Xue Za Zhi

January 2025

Department of Pathogen Biology and Immunology, Kunming Medical University, Kunming 650500, China. *Corresponding authors, E-mail:

The innate immune response is the first line of defense for the host against viral infections. Targeted degradation of pathogenic microorganisms through autophagy, in conjunction with pattern recognition receptors synergistically inducing the production of interferon (IFN), constitutes an important pathway for the body to resist viral infections. Rubicon, a Run domain Beclin 1-interacting and cysteine-rich domain protein, has an inhibitory effect on autophagy and IFN production.

View Article and Find Full Text PDF

Objective To investigate the effect of basic helix-loop-helix family member E40 (BHLHE40) on the invasion and migration of osteosarcoma (OS) cells, and to explore the role of the phosphatidylinositol 3-kinase/protein kinase B (PI3K/AKT) signaling pathway in the biological behavior of OS mediated by BHLHE40, providing a scientific basis for targeted therapy of OS. Methods On the basis of clinical OS samples and OS cell lines, the expression differences of BHLHE40 between OS and adjacent tissues, as well as those between OS cells and normal osteoblast cell lines, were analyzed. BHLHE40 knockdown OS cells were obtained through shRNA transfection.

View Article and Find Full Text PDF

As nucleus-forming phages become better characterized, understanding their unifying similarities and unique differences will help us understand how they occupy varied niches and infect diverse hosts. All identified nucleus-forming phages fall within the Chimalliviridae family and share a core genome of 68 unique genes including chimallin, the major nuclear shell protein. A well-studied but non-essential protein encoded by many nucleus-forming phages is PhuZ, a tubulin homolog which aids in capsid migration, nucleus rotation, and nucleus positioning.

View Article and Find Full Text PDF

Multi-omics analyses of early-onset familial Alzheimer's disease and Sanfilippo syndrome zebrafish models reveal commonalities in disease mechanisms.

Biochim Biophys Acta Mol Basis Dis

January 2025

Alzheimer's Disease Genetics Laboratory, School of Molecular and Biomedical Sciences, Faculty of Sciences, Engineering and Technology, The University of Adelaide, North Terrace Campus, Adelaide, SA 5005, Australia.

Sanfilippo syndrome (mucopolysaccharidosis type III, MPSIII) causes childhood dementia, while Alzheimer's disease is the most common type of adult-onset dementia. There is no cure for either of these diseases, and therapeutic options are extremely limited. Increasing evidence suggests commonalities in the pathogenesis of these diseases.

View Article and Find Full Text PDF

Discovery of potentially degrading microflora of different types of plastics based on long-term in-situ incubation in the deep sea.

Environ Res

January 2025

Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources of China, State Key Laboratory Breeding Base of Marine Genetic Resources, Fujian Key Laboratory of Marine Genetic Resources, Xiamen 361005, China; Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519000, China. Electronic address:

Plastic waste that ends up in the deep sea is becoming an increasing concern. However, it remains unclear whether there is any microflora capable of degrading plastic within this vast ecosystem. In this study, we investigated the bacterial communities associated with different types of plastic-polyamide-nylon 4, 6 (PA), polyethylene (PE), polyethylene terephthalate (PET), and polystyrene (PS)-after one year of in situ incubation in the pelagic deep sea of the Western Pacific.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!