Use of 3D chaos game representation to quantify DNA sequence similarity with applications for hierarchical clustering.

J Theor Biol

Computational Science Research Center, San Diego State University, 5500 Campanile Dr, San Diego, 92182, CA, USA; Department of Mathematics and Statistics, San Diego State University, 5500 Campanile Dr, San Diego, 92182, CA, USA.

Published: January 2025

A 3D chaos game is shown to be a useful way for encoding DNA sequences. Since matching subsequences in DNA converge in space in 3D chaos game encoding, a DNA sequence's 3D chaos game representation can be used to compare DNA sequences without prior alignment and without truncating or padding any of the sequences. Two proposed methods inspired by shape-similarity comparison techniques show that this form of encoding can perform as well as alignment-based techniques for building phylogenetic trees. The first method uses the volume overlap of intersecting spheres and the second uses shape signatures by summarizing the coordinates, oriented angles, and oriented distances of the 3D chaos game trajectory. The methods are tested using: (1) the first exon of the beta-globin gene for 11 species, (2) mitochondrial DNA from four groups of primates, and (3) a set of synthetic DNA sequences. Simulations show that the proposed methods produce distances that reflect the number of mutation events; additionally, on average, distances resulting from deletion mutations are comparable to those produced by substitution mutations.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jtbi.2024.111972DOI Listing

Publication Analysis

Top Keywords

chaos game
20
dna sequences
12
game representation
8
game encoding
8
encoding dna
8
proposed methods
8
dna
7
chaos
5
representation quantify
4
quantify dna
4

Similar Publications

Improving flood-prone areas mapping using geospatial artificial intelligence (GeoAI): A non-parametric algorithm enhanced by math-based metaheuristic algorithms.

J Environ Manage

January 2025

Dept. of Computer Science & Engineering and Convergence Engineering for Intelligent Drone, XR Research Center, Sejong University, Seoul, Republic of Korea. Electronic address:

Flooding presents substantial dangers to human lives and infrastructure, underscoring the need to map flood-prone areas to implement effective mitigation measures precisely. Although machine learning algorithms have made great strides, their accuracy in flood susceptibility mapping (FSM) remains limited due to data dependence, interpretability, and explainability issues, overfitting, generalization difficulties, and hyperparameter tuning. This study suggests combining the Decision Tree (DT) algorithm with advanced, math-based metaheuristic optimization algorithms to address these limitations.

View Article and Find Full Text PDF

Generosity through donation plays a crucial role in reducing inequality and influencing human behavior. However, previous research on donation has overlooked individuals' acceptance of the extent of inequality, which acts as a trigger for donation. To address this gap, this paper systematically explores the impact of donation based on inequality tolerance on the evolution of cooperation in spatial public goods game.

View Article and Find Full Text PDF

On-pitch rehabilitation is a crucial part of returning to sport after injury in elite soccer. The () initially offered a framework for practitioners to plan on-pitch rehabilitation, focusing on physical preparation and sport specificity. However, our experiences with the , combined with recent research in injury neurophysiology, point to a need for an updated model that integrates practice design and physical-cognitive interactions.

View Article and Find Full Text PDF

Physics-informed Neural Implicit Flow neural network for parametric PDEs.

Neural Netw

January 2025

Defense Innovation Institute, Chinese Academy of Military Science, Beijing 100071, China; Intelligent Game and Decision Laboratory, China.

The Physics-informed Neural Network (PINN) has been a popular method for solving partial differential equations (PDEs) due to its flexibility. However, PINN still faces challenges in characterizing spatio-temporal correlations when solving parametric PDEs due to network limitations. To address this issue, we propose a Physics-Informed Neural Implicit Flow (PINIF) framework, which enables a meshless low-rank representation of the parametric spatio-temporal field based on the expressiveness of the Neural Implicit Flow (NIF), enabling a meshless low-rank representation.

View Article and Find Full Text PDF

MVSLLnc: LncRNA subcellular localization prediction based on multi-source features and two-stage voting strategy.

Methods

January 2025

National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China. Electronic address:

The subcellular localization of long non-coding RNAs (lncRNAs) is crucial for understanding the function of lncRNAs. Since the traditional biological experimental methods are time-consuming and some existing computational methods rely on high computing power, we are committed to finding a simple and easy-to-implement method to achieve more efficient prediction of the subcellular localization of lncRNAs. In this work, we proposed a model based on multi-source features and two-stage voting strategy for predicting the subcellular localization of lncRNAs (MVSLLnc).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!