Motivation: There are many clustered transcriptionally active regions in the human genome, in which the transcription complex cannot immediately terminate transcription at the upstream gene termination site, but instead continues to transcribe intergenic regions and downstream genes, resulting in read-through transcripts. Several studies have demonstrated the regulatory roles of read-through transcripts in tumorigenesis and development. However, limited by the read length of next-generation sequencing, discovery of read-through transcripts has been slow. For long but also erroneous third-generation sequencing data, this study developed a novel minimizer sketch algorithm to accurately and quickly identify read-through transcripts.
Results: Readon initially splits the reference sequence into distinct active regions. It employs a sliding window approach within each region, calculates minimizers, and constructs the specialized structured arrays for query indexing. Following initial alignment anchor screening of candidate read-through transcripts, further confirmation steps are executed. Comparative assessments against existing software reveal Readon's superior performance on both simulated and validated real data. Additionally, two downstream tools are provided: one for predicting whether a read-through transcript is likely to undergo nonsense-mediated decay or encodes a protein, and another for visualizing splicing patterns.
Availability And Implementation: Readon is freely available on GitHub (https://github.com/Bulabula45/Readon).
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11162696 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btae336 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!