SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.

J Bioinform Comput Biol

Department of AI and Bioinformatics, Nanjing Chengshi Biopharmaceutical (TheraRNA) Co., Ltd., Nanjing, P. R. China.

Published: October 2024

AI Article Synopsis

  • Genetic mutations can disrupt cellular signaling pathways and lead to cancer by creating abnormal proteins not found in the normal human body, which could serve as potential drug targets.
  • Current sequencing tools mainly focus on point mutations and struggle to detect larger, more complex mutations and don't provide protein-level insights.
  • The Sequencing Analysis Kit (SAKit) is a new bioinformatics tool that combines long-read and short-read RNA sequencing data to effectively identify and validate both large and small genetic variations in human and mouse studies.

Article Abstract

Genetic mutations that cause the inactivation or aberrant activation of essential proteins may trigger alterations or even dysfunctions in cellular signaling pathways, culminating in the development of precancerous lesions and cancer. Mutations and such dysfunctions can result in the generation of "novel proteins" that are not part of the conventional human proteome. Identification of these proteins carries a profound potential for unraveling promising drug targets and designing innovative therapeutic models. Despite the emergence of diverse tools for detecting DNA or RNA variants, facilitated by the widespread adoption of nucleotide sequencing technology, these methods primarily target point mutations and exhibit suboptimal performance in detecting large-scale and combinatorial mutations. Additionally, the outcomes of these tools are confined to the genome and transcriptome levels, and do not provide the corresponding protein information resulting from genetic alterations. We present the development of Sequencing Analysis Kit (SAKit), a bioinformatics pipeline for hybrid sequencing analysis integrating long-read and short-read RNA sequencing data. Long reads are utilized for detecting large-scale variations such as gene fusions, exon skipping, intron retention, and aberrant expression in non-coding regions, owing to their excellent coverage capabilities. Short reads serve to validate these findings at breakpoints and splice junctions. Conversely, short reads are employed for identifying small-scale variations, including single nucleotide variants, deletions, and insertions, due to their superior sequencing depth, with long reads providing additional validation. SAKit is designed to perform analyses using inter-species configuration files comprising genome references and annotation data, making it applicable to both human and mouse studies. Furthermore, SAKit implements a hierarchical filtering approach to eliminate low-confidence variants and employs open reading frame (ORF) analysis to translate identified variants into protein sequences. SAKit is a robust and versatile bioinformatics tool designed for the comprehensive identification of both large-scale and small-scale variants from RNA-seq data, facilitating the discovery of novel proteins. This pipeline integrates analysis of long-read and short-read sequencing data, offering a powerful solution for researchers in genomics and transcriptomics. SAKit is freely accessible and open-source, available through GitHub (https://github.com/therarna/SAKit) and as a Docker image https://hub.docker.com/repository/docker/therarna). Implemented primarily within a Snakemake framework using Python, SAKit ensures reproducibility, scalability, and ease of use for the scientific community.

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0219720024500227DOI Listing

Publication Analysis

Top Keywords

novel proteins
8
detecting large-scale
8
sequencing analysis
8
long-read short-read
8
sequencing data
8
long reads
8
short reads
8
sakit
7
sequencing
6
analysis
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!