A deep learning-based method for the prediction of DNA interacting residues in a protein.

Brief Bioinform

Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India.

Published: September 2022

DNA-protein interaction is one of the most crucial interactions in the biological system, which decides the fate of many processes such as transcription, regulation and splicing of genes. In this study, we trained our models on a training dataset of 646 DNA-binding proteins having 15 636 DNA interacting and 298 503 non-interacting residues. Our trained models were evaluated on an independent dataset of 46 DNA-binding proteins having 965 DNA interacting and 9911 non-interacting residues. All proteins in the independent dataset have less than 30% of sequence similarity with proteins in the training dataset. A wide range of traditional machine learning and deep learning (1D-CNN) techniques-based models have been developed using binary, physicochemical properties and Position-Specific Scoring Matrix (PSSM)/evolutionary profiles. In the case of machine learning technique, eXtreme Gradient Boosting-based model achieved a maximum area under the receiver operating characteristics (AUROC) curve of 0.77 on the independent dataset using PSSM profile. Deep learning-based model achieved the highest AUROC of 0.79 on the independent dataset using a combination of all three profiles. We evaluated the performance of existing methods on the independent dataset and observed that our proposed method outperformed all the existing methods. In order to facilitate scientific community, we developed standalone software and web server, which are accessible from https://webs.iiitd.edu.in/raghava/dbpred.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbac322DOI Listing

Publication Analysis

Top Keywords

independent dataset
20
dna interacting
12
deep learning-based
8
trained models
8
training dataset
8
dna-binding proteins
8
non-interacting residues
8
machine learning
8
model achieved
8
existing methods
8

Similar Publications

Background: Although previous studies have investigated the risk factors for rotator cuff syndrome (RCS), there remains controversy due to uncontrolled and uncertain confounding factors in their analyses.

Purpose: To perform Mendelian randomization (MR) analysis using single-nucleotide polymorphisms to investigate the causal relationship between RCS and 4 risk factors: type 2 diabetes mellitus (T2DM), high blood pressure (HBP), body mass index (BMI), and low high-density lipoprotein cholesterol (HDL-C).

Study Design: Descriptive epidemiology study.

View Article and Find Full Text PDF

Adipose tissue may not be a major player in the inflammatory pathogenesis of Autism Spectrum Disorder.

Brain Behav Immun Health

February 2025

Institute of Maternal and Child Medicine, Shenzhen Maternity and Child Healthcare Hospital, Southern Medical University, Shenzhen, China.

Purpose: Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder increasingly recognized for its strong association with chronic inflammation. Adipose tissue functions as an endocrine organ and can secrete inflammatory cytokines to mediate inflammation. However, its involvement in ASD-related inflammation remains unclear.

View Article and Find Full Text PDF

Environmental Variation Influences Genome Evolution in Hispaniolan Trunk Anoles (Anolis distichus).

Mol Ecol

January 2025

Department of Ecology and Evolutionary Biology, Biodiversity Institute, University of Kansas, Lawrence, Kansas, USA.

Environmental variation often drives evolutionary processes like population differentiation, local adaptation and speciation. We used genome-scale data to investigate the contribution of environmental variation to evolution of the North Caribbean bark anole (Anolis distichus), a widespread common lizard that exhibits impressive phenotypic variation across varying habitats on the island of Hispaniola. We obtained new double-digest restriction-associated DNA sequence data (ddRADseq) from nearly 200 individuals and used 53 GIS data layers representing a range of environmental variables.

View Article and Find Full Text PDF

Background: Advanced gastric cancer (GC) exhibits a high recurrence rate and a dismal prognosis. Myocyte enhancer factor 2c (MEF2C) was found to contribute to the development of various types of cancer. Therefore, our aim is to develop a prognostic model that predicts the prognosis of GC patients and initially explore the role of MEF2C in immunotherapy for GC.

View Article and Find Full Text PDF

Soil erosion susceptibility maps and raster dataset for the hydrological basins of North Africa.

Sci Data

January 2025

University of Southern California, Viterbi School of Engineering, 3737 Watt Way, Powell Hall of Engineering, Los Angeles, CA, 90089, USA.

Soil erosion in North Africa modulates agricultural and urban developments as well as the impacts of flash floods. Existing investigations and associated datasets are mainly performed in localized urban areas, often representing a limited part of a watershed. The above compromises the implementation of mitigation measures for this vast area under accentuating extremes and continuous hydroclimatic fluctuations.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!