Nucleotide conversion RNA sequencing techniques interrogate chemical RNA modifications in cellular transcripts, resulting in mismatch-containing reads. Biases in mapping the resulting reads to reference genomes remain poorly understood. We present splice_sim, a splice-aware RNA-seq simulation and evaluation pipeline that introduces user-defined nucleotide conversions at set frequencies, creates mixture models of converted and unconverted reads, and calculates mapping accuracies per genomic annotation. By simulating nucleotide conversion RNA-seq datasets under realistic experimental conditions, including metabolic RNA labeling and RNA bisulfite sequencing, we measure mapping accuracies of state-of-the-art spliced-read mappers for mouse and human transcripts and derive strategies to prevent biases in the data interpretation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11514792PMC
http://dx.doi.org/10.1186/s13059-024-03313-8DOI Listing

Publication Analysis

Top Keywords

rna-seq simulation
8
simulation evaluation
8
nucleotide conversion
8
mapping accuracies
8
splice_sim nucleotide
4
nucleotide conversion-enabled
4
conversion-enabled rna-seq
4
evaluation framework
4
framework nucleotide
4
rna
4

Similar Publications

Parkinson's disease (PD) is a neurodegenerative disorder characterized by dopaminergic neuron degeneration and α-synuclein (α-syn) aggregation. Lipid metabolism dysfunction may contribute to PD progression. This study aims to identify lipid metabolism-related genes (LMGs) associated with PD using an integrative transcriptomic analysis of microarray and single-cell RNA sequencing (scRNA-seq) datasets from patients with PD and healthy controls.

View Article and Find Full Text PDF

Single-cell RNA sequencing (scRNA-seq) offers remarkable insights into cellular development and differentiation by capturing the gene expression profiles of individual cells. The role of dimensionality reduction and visualization in the interpretation of scRNA-seq data has gained widely acceptance. However, current methods face several challenges, including incomplete structure-preserving strategies and high distortion in embeddings, which fail to effectively model complex cell trajectories with multiple branches.

View Article and Find Full Text PDF

Introduction: Biomarkers play a crucial role across various fields by providing insights into biological responses to interventions. High-throughput gene expression profiling technologies facilitate the discovery of data-driven biomarkers through extensive datasets. This study focuses on identifying biomarkers in gene expression data related to chemical injuries by mustard gas, covering a spectrum from healthy individuals to severe injuries.

View Article and Find Full Text PDF

Background: MARVEL domain-containing 1 (MARVELD1) has been implicated in the progression of several cancers, but its role in pancreatic adenocarcinoma (PAAD) remains poorly understood.

Methods: RNA-seq data from the TCGA-PAAD and GTEx-Pancreas cohorts were analyzed to assess MARVELD1 expression. Stable MARVELD1 knockdown and overexpression were conducted in BxPC3 and PANC-1 cells.

View Article and Find Full Text PDF

Background: Pancreatic cancer is one of the most malignant abdominal tumors. DDX60 has been shown to be associated with a variety of tumor biological processes. However, DDX60 in pancreatic cancer has not been reported.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!