Motivation: Genome sequencing technologies reveal a huge amount of genomic sequences. Neural network-based methods can be prime candidates for retrieving insights from these sequences because of their applicability to large and diverse datasets. However, the highly variable lengths of genome sequences severely impair the presentation of sequences as input to the neural network. Genetic variations further complicate tasks that involve sequence comparison or alignment.

Results: Inspired by the theory and applications of "spaced seeds," we propose a graph representation of genome sequences called "gapped pattern graph." These graphs can be transformed through a Graph Convolutional Network to form lower-dimensional embeddings for downstream tasks. On the basis of the gapped pattern graphs, we implemented a neural network model and demonstrated its performance on diverse tasks involving microbe and mammalian genome data. Our method consistently outperformed all the other state-of-the-art methods across various metrics on all tasks, especially for the sequences with limited homology to the training data. In addition, our model was able to identify distinct gapped pattern signatures from the sequences.

Availability And Implementation: The framework is available at https://github.com/deepomicslab/GCNFrame.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11034989PMC
http://dx.doi.org/10.1093/bioinformatics/btae188DOI Listing

Publication Analysis

Top Keywords

gapped pattern
12
graph convolutional
8
convolutional network
8
genome sequences
8
neural network
8
sequences
6
coding genomes
4
genomes gapped
4
pattern
4
pattern graph
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!