Application of an optimized annotation pipeline to the Cryptococcus deuterogattii genome reveals dynamic primary metabolic gene clusters and genomic impact of RNAi loss.

Patrícia Aline Gröhs Ferrareze Corinne Maufrais Rodrigo Silva Araujo Streit Shelby J Priest Christina A Cuomo Joseph Heitman Charley Christian Staats Guilhem Janbon

G3 (Bethesda)

Département de Mycologie, Institut Pasteur, Unité Biologie des ARN des Pathogènes Fongiques, F-75015 Paris, France.

Published: February 2021

The study optimizes a bioinformatics pipeline for annotating complex fungal genomes using RNA-seq data, focusing on pathogenic yeasts Cryptococcus neoformans and Cryptococcus deneoformans.
The quality of the annotation is heavily influenced by the quantity of RNA-seq reads, with optimal results achieved at 5-10 million reads per replicate; the number of predicted introns serves as an effective indicator of annotation quality.
Dynamic transcriptome analysis of the RNAi-deficient species, Cryptococcus deuterogattii, shows significant intron retention compared to its RNAi-proficient counterparts, while gene content analysis reveals the loss of key transcription factors and potential adaptive evolution in metabolite assimilation.

Evaluating the quality of a de novo annotation of a complex fungal genome based on RNA-seq data remains a challenge. In this study, we sequentially optimized a Cufflinks-CodingQuary-based bioinformatics pipeline fed with RNA-seq data using the manually annotated model pathogenic yeasts Cryptococcus neoformans and Cryptococcus deneoformans as test cases. Our results show that the quality of the annotation is sensitive to the quantity of RNA-seq data used and that the best quality is obtained with 5-10 million reads per RNA-seq replicate. We also showed that the number of introns predicted is an excellent a priori indicator of the quality of the final de novo annotation. We then used this pipeline to annotate the genome of the RNAi-deficient species Cryptococcus deuterogattii strain R265 using RNA-seq data. Dynamic transcriptome analysis revealed that intron retention is more prominent in C. deuterogattii than in the other RNAi-proficient species C. neoformans and C. deneoformans. In contrast, we observed that antisense transcription was not higher in C. deuterogattii than in the two other Cryptococcus species. Comparative gene content analysis identified 21 clusters enriched in transcription factors and transporters that have been lost. Interestingly, analysis of the subtelomeric regions in these three annotated species identified a similar gene enrichment, reminiscent of the structure of primary metabolic clusters. Our data suggest that there is active exchange between subtelomeric regions, and that other chromosomal regions might participate in adaptive diversification of Cryptococcus metabolite assimilation potential.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8022950	PMC
http://dx.doi.org/10.1093/g3journal/jkaa070	DOI Listing

Publication Analysis

Top Keywords

rna-seq data

annotation pipeline

cryptococcus deuterogattii

primary metabolic

novo annotation

subtelomeric regions

cryptococcus

rna-seq

data

application optimized

Similar Publications

Galaxy as a gateway to bioinformatics: Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for scRNA-seq.

Gigascience

January 2025

School of Life, Health & Chemical Sciences, The Open University, Milton Keynes, Buckinghamshire, MK7 6AA, UK.

Camila L Goclowski Julia Jakiela Tyler Collins Saskia Hiltemann Morgan Howells

Background: Bioinformatics is fundamental to biomedical sciences, but its mastery presents a steep learning curve for bench biologists and clinicians. Learning to code while analyzing data is difficult. The curve may be flattened by separating these two aspects and providing intermediate steps for budding bioinformaticians.

View Article and Find Full Text PDF

Similar Publications

Bioinformation study of immune microenvironment characteristics of disulfidptosis-related subtypes in ovarian cancer and prognostic model construction.

Discov Oncol

January 2025

Department of Obstetrics and Gynecology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China.

Ying Zhou Yuhong Zhang Yang Zhou Yanzheng Gu Youguo Chen

Objective: Ovarian cancer significantly impacts women's reproductive health and remains challenging to diagnose and treat. Despite advancements in understanding DNA repair mechanisms and identifying novel therapeutic targets, additional strategies are still needed. Recently, a novel form of cell death called disulfidptosis, which is triggered by glucose deprivation, has been linked to treatment resistance and changes in the tumor microenvironment (TME).

View Article and Find Full Text PDF

Similar Publications

SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.

PLoS One

January 2025

Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.

Mario Grassi Barbara Tarantino

A Directed Acyclic Graph (DAG) offers an easy approach to define causal structures among gathered nodes: causal linkages are represented by arrows between the variables, leading from cause to effect. Recently, industry and academics have paid close attention to DAG structure learning from observable data, and many techniques have been put out to address the problem. We provide a two-step approach, named SEMdag(), that can be used to quickly learn high-dimensional linear SEMs.

View Article and Find Full Text PDF

Similar Publications

Chronology of transcriptome and proteome expression during early Arabidopsis flower development.

J Exp Bot

January 2025

Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, 08193 Cerdanyola del Vallès, Barcelona, Spain.

Raquel Álvarez-Urdiola José Tomás Matus Víctor Manuel González-Miguel Martí Bernardo-Faura José Luis Riechmann

The complex gene regulatory landscape underlying early flower development in Arabidopsis has been extensively studied through transcriptome profiling, and gene networks controlling floral organ development have been derived from the analyses of genome wide binding of key transcription factors. In contrast, the dynamic nature of the proteome during the flower development process is much less understood. In this study, we characterized the floral proteome at different stages during early flower development and correlated it with unbiased transcript expression data.

View Article and Find Full Text PDF

Similar Publications

tomoseqr: A Bioconductor package for spatial reconstruction and visualization of 3D gene expression patterns based on RNA tomography.

PLoS One

January 2025

Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan.

Ryosuke Matsuzawa Daichi Kawahara Makoto Kashima Hiromi Hirata Haruka Ozaki

RNA tomography computationally reconstructs 3D spatial gene expression patterns genome-widely from 1D tomo-seq data, generated by RNA sequencing of cryosection samples along three orthogonal axes. We developed tomoseqr, an R package designed for RNA tomography analysis of tomo-seq data, to reconstruct and visualize 3D gene expression patterns through user-friendly graphical interfaces. We show the effectiveness of tomoseqr using simulated and real tomo-seq data, validating its utility for researchers.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!