Enhancing RNA-seq bias mitigation with the Gaussian self-benchmarking framework: towards unbiased sequencing data.

BMC Genomics

Faculty of Synthetic Biology, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Shenzhen University of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.

Published: September 2024

Background: RNA sequencing is a vital technique for analyzing RNA behavior in cells, but it often suffers from various biases that distort the data. Traditional methods to address these biases are typically empirical and handle them individually, limiting their effectiveness. Our study introduces the Gaussian Self-Benchmarking (GSB) framework, a novel approach that leverages the natural distribution patterns of guanine (G) and cytosine (C) content in RNA to mitigate multiple biases simultaneously. This method is grounded in a theoretical model, organizing k-mers based on their GC content and applying a Gaussian model for alignment to ensure empirical sequencing data closely match their theoretical distribution.

Results: The GSB framework demonstrated superior performance in mitigating sequencing biases compared to existing methods. Testing with synthetic RNA constructs and real human samples showed that the GSB approach not only addresses individual biases more effectively but also manages co-existing biases jointly. The framework's reliance on accurately pre-determined parameters like mean and standard deviation of GC content distribution allows for a more precise representation of RNA samples. This results in improved accuracy and reliability of RNA sequencing data, enhancing our understanding of RNA behavior in health and disease.

Conclusions: The GSB framework presents a significant advancement in RNA sequencing analysis by providing a well-validated, multi-bias mitigation strategy. It functions independently from previously identified dataset flaws and sets a new standard for unbiased RNA sequencing results. This development enhances the reliability of RNA studies, broadening the potential for scientific breakthroughs in medicine and biology, particularly in genetic disease research and the development of targeted treatments.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11441123PMC
http://dx.doi.org/10.1186/s12864-024-10814-0DOI Listing

Publication Analysis

Top Keywords

rna sequencing
16
sequencing data
12
gsb framework
12
rna
10
gaussian self-benchmarking
8
rna behavior
8
reliability rna
8
sequencing
7
biases
6
enhancing rna-seq
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!