Machine Learning Techniques for Classifying the Mutagenic Origins of Point Mutations.

Genetics

Research School of Biology, The Australian National University, Canberra, Australian Capital Territory 2601, Australia

Published: May 2020

AI Article Synopsis

  • There is growing interest in creating diagnostic tools to differentiate between various mutagenic mechanisms, especially in cancer research and population studies.
  • Researchers evaluated the origin of point mutations by comparing spontaneous mutations in mice to those caused by the chemical mutagen ENU, finding notable similarities that complicate classification.
  • Using a new modeling approach and machine learning, the study achieved high accuracy in distinguishing mutation types based on neighboring base sequences, suggesting this method could be applied more broadly to mutation classification across different contexts.

Article Abstract

There is increasing interest in developing diagnostics that discriminate individual mutagenic mechanisms in a range of applications that include identifying population-specific mutagenesis and resolving distinct mutation signatures in cancer samples. Analyses for these applications assume that mutagenic mechanisms have a distinct relationship with neighboring bases that allows them to be distinguished. Direct support for this assumption is limited to a small number of simple cases, , CpG hypermutability. We have evaluated whether the mechanistic origin of a point mutation can be resolved using only sequence context for a more complicated case. We contrasted single nucleotide variants originating from the multitude of mutagenic processes that normally operate in the mouse germline with those induced by the potent mutagen N-ethyl-N-nitrosourea (ENU). The considerable overlap in the mutation spectra of these two samples make this a challenging problem. Employing a new, robust log-linear modeling method, we demonstrate that neighboring bases contain information regarding point mutation direction that differs between the ENU-induced and spontaneous mutation variant classes. A logistic regression classifier exhibited strong performance at discriminating between the different mutation classes. Concordance between the feature set of the best classifier and information content analyses suggest our results can be generalized to other mutation classification problems. We conclude that machine learning can be used to build a practical classification tool to identify the mutation mechanism for individual genetic variants. Software implementing our approach is freely available under an open-source license.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7198283PMC
http://dx.doi.org/10.1534/genetics.120.303093DOI Listing

Publication Analysis

Top Keywords

machine learning
8
mutagenic mechanisms
8
mutation
8
neighboring bases
8
point mutation
8
learning techniques
4
techniques classifying
4
mutagenic
4
classifying mutagenic
4
mutagenic origins
4

Similar Publications

Automated Classification of Cardiac Arrhythmia using Short-Duration ECG Signals and Machine Learning.

Biomed Phys Eng Express

January 2025

Electronics and Communication Engineering, Rajiv Gandhi University, Rono Hills, Doimukh, ITANAGAR, Itanagar, Arunachal Pradesh, 791112, INDIA.

Accurate detection of cardiac arrhythmias is crucial for preventing premature deaths. The current study employs a dual-stage Discrete Wavelet Transform (DWT) and a median filter to eliminate noise from ECG signals. Subsequently, ECG signals are segmented, and QRS regions are extracted for further preprocessing.

View Article and Find Full Text PDF

Eutrophication is one of the most relevant concerns due to the risk to water supply and food security. Nitrogen and phosphorus chemical species concentrations determined the risk and magnitude of eutrophication. These analyses are even more relevant in basins with intensive agriculture due to agrochemical discharges.

View Article and Find Full Text PDF

Automated and Efficient Sampling of Chemical Reaction Space.

Adv Sci (Weinh)

January 2025

Department of Chemistry, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea.

Machine learning interatomic potentials (MLIPs) promise quantum-level accuracy at classical force field speeds, but their performance hinges on the quality and diversity of training data. An efficient and fully automated approach to sample chemical reaction space without relying on human intuition, addressing a critical gap in MLIP development is presented. The method combines the speed of tight-binding calculations with selective high-level refinement, generating diverse datasets that capture both equilibrium and reactive regions of potential energy surfaces.

View Article and Find Full Text PDF

The aim of this study is to address the limitations of convolutional networks in recognizing modulation patterns. These networks are unable to utilize temporal information effectively for feature extraction and modulation pattern recognition, resulting in inefficient modulation pattern recognition. To address this issue, a signal modulation recognition method based on a two-way interactive temporal attention network algorithm has been developed.

View Article and Find Full Text PDF

Soil spectroscopy is a widely used method for estimating soil properties that are important to environmental and agricultural monitoring. However, a bottleneck to its more widespread adoption is the need for establishing large reference datasets for training machine learning (ML) models, which are called soil spectral libraries (SSLs). Similarly, the prediction capacity of new samples is also subject to the number and diversity of soil types and conditions represented in the SSLs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!