A novel representation of genomic sequences for taxonomic clustering and visualization by means of self-organizing maps.

Bioinformatics

Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain.

Published: March 2015

AI Article Synopsis

  • Self-organizing maps (SOMs) are effective tools in bioinformatics for clustering and visualizing high-dimensional genomic data, but require innovative methods to convert nucleotide sequences into numerical vectors to handle complexities like ambiguities and alignment gaps.
  • Six different coding variations using Euclidean space were tested on two SOM models with RNA and HIV gene sequences, showing that the weighting of alignment gaps significantly influences clustering accuracy.
  • Although the coding methods yielded varying levels of taxonomic accuracy, they aligned well with established phylogenetic analyses, indicating the potential for widespread application in genomic research.

Article Abstract

Motivation: Self-organizing maps (SOMs) are readily available bioinformatics methods for clustering and visualizing high-dimensional data, provided that such biological information is previously transformed to fixed-size, metric-based vectors. To increase the usefulness of SOM-based approaches for the analysis of genomic sequence data, novel representation methods are required that automatically and objectively transform aligned nucleotide sequences into numeric vectors, dealing with both nucleotide ambiguity and gaps derived from sequence alignment.

Results: Six different codification variants based on Euclidean space, just like SOM processing, have been tested using two SOM models: the classical Kohonen's SOM and growing cell structures. They have been applied to two different sets of sequences: 32 sequences of small sub-unit ribosomal RNA from organisms belonging to the three domains of life, and 44 sequences of the reverse transcriptase region of the pol gene of human immunodeficiency virus type 1 belonging to different groups and sub-types. Our results show that the most important factor affecting the accuracy of sequence clustering is the assignment of an extra weight to the presence of alignment-derived gaps. Although each of the codification variants shows a different level of taxonomic consistency, the results are in agreement with sequence-based phylogenetic reconstructions and anticipate a broad applicability of this codification method.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu708DOI Listing

Publication Analysis

Top Keywords

novel representation
8
self-organizing maps
8
codification variants
8
sequences
5
representation genomic
4
genomic sequences
4
sequences taxonomic
4
taxonomic clustering
4
clustering visualization
4
visualization self-organizing
4

Similar Publications

Objective: Early detection of surgical complications allows for timely therapy and proactive risk mitigation. Machine learning (ML) can be leveraged to identify and predict patient risks for postoperative complications. We developed and validated the effectiveness of predicting postoperative complications using a novel surgical Variational Autoencoder (surgVAE) that uncovers intrinsic patterns via cross-task and cross-cohort presentation learning.

View Article and Find Full Text PDF

Esophageal adenocarcinoma (EAC) is an aggressive cancer characterized by a high risk of relapse post-surgery. Current follow-up methods (serum carcinoembryonic antigen detection and PET-CT) lack sensitivity and reliability, necessitating a novel approach. Analyzing cell-free DNA (cfDNA) from blood plasma emerges as a promising avenue.

View Article and Find Full Text PDF

Understanding the function of proteins is of great significance for revealing disease pathogenesis and discovering new targets. Benefiting from the explosive growth of the protein universal, deep learning has been applied to accelerate the protein annotation cycle from different biological modalities. However, most existing deep learning-based methods not only fail to effectively fuse different biological modalities, resulting in low-quality protein representations, but also suffer from the convergence of suboptimal solution caused by sparse label representations.

View Article and Find Full Text PDF

Exploring spiking neural networks for deep reinforcement learning in robotic tasks.

Sci Rep

December 2024

Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi", Università di Bologna, 40126, Bologna, Italy.

Spiking Neural Networks (SNNs) stand as the third generation of Artificial Neural Networks (ANNs), mirroring the functionality of the mammalian brain more closely than their predecessors. Their computational units, spiking neurons, characterized by Ordinary Differential Equations (ODEs), allow for dynamic system representation, with spikes serving as the medium for asynchronous communication among neurons. Due to their inherent ability to capture input dynamics, SNNs hold great promise for deep networks in Reinforcement Learning (RL) tasks.

View Article and Find Full Text PDF

Due to recent advances in 3D reconstruction from RGB images, it is now possible to create photorealistic representations of real-world scenes that only require minutes to be reconstructed and can be rendered in real time. In particular, 3D Gaussian splatting shows promising results, outperforming preceding reconstruction methods while simultaneously reducing the overall computational requirements. The main success of 3D Gaussian splatting relies on the efficient use of a differentiable rasterizer to render the Gaussian scene representation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!