Fast prediction of protein domain boundaries using conserved local patterns.

J Mol Model

Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India.

Published: September 2006

We have found certain conserved motifs and secondary structural patterns present in the vicinity of interior domain boundary points (dbps) by a data-driven approach without any a priori constraint on the type and number of such features, and without any requirement of sequence homology. We have used these motifs and patterns to rerank the solutions obtained by the well-known domain guess by size (DGS) algorithm. We predict, overall, five solutions. The average accuracy of overall (i.e., top five) predictions by our method [domain boundary prediction using conserved patterns (DPCP)] has improved the average accuracy of the top five solutions of DGS from 71.74 to 82.88 %, in the case of two-continuous-domain proteins, and from 21.38 to 80.56 %, for two-discontinuous-domain proteins. Considering only the top solution, the gains in accuracy are from 0 to 72.74 % for two-continuous-domain proteins with chain lengths up to 300 residues, and from 0 to 62.85 % for those with up to 400 residues. In the case of discontinuous domains, top_min solutions (the minimum number of solutions required for predicting all dbps of a protein) of DPCP improve the average accuracy of DGS prediction from 12.5 to 76.3 % in proteins with chain lengths up to 300 residues, and from 13.33 to 70.84 % for proteins with up to 400 residues. In our validation experiments, the performance of DPCP was also found to be superior to that of domain identification from secondary structure element alignment (DomSSEA), the best method reported so far for efficient prediction of domain boundaries using predicted secondary structure. The average accuracies of the topmost solution of DomSSEA are 61 and 52 % for proteins with up to 300 residues and 400, respectively, in the case of continuous domains; the corresponding accuracies for the discontinuous case are 28 and 21 %.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00894-006-0116-0DOI Listing

Publication Analysis

Top Keywords

average accuracy
12
300 residues
12
domain boundaries
8
accuracy top
8
two-continuous-domain proteins
8
proteins chain
8
chain lengths
8
lengths 300
8
400 residues
8
secondary structure
8

Similar Publications

Purpose: The study aimed to develop a deep learning model for rapid, automated measurement of full-spine X-rays in adolescents with Adolescent Idiopathic Scoliosis (AIS). A significant challenge in this field is the time-consuming nature of manual measurements and the inter-individual variability in these measurements. To address these challenges, we utilized RTMpose deep learning technology to automate the process.

View Article and Find Full Text PDF

Water infiltration into soil is important in geotechnical engineering. The classical Green-Ampt (GA) infiltration model is widely used in soil infiltration due to its physical significance, but it ignores the actual unsaturated layer in the infiltration process and has some deficiencies. Thus, the present study established a modified GA infiltration model (MLGA model) using Darcy's infiltration law and continuity equation to fully consider the variation characteristics of the soil water profile in the infiltration process.

View Article and Find Full Text PDF

Design and experimental study of tillage depth control system for electric rotary tiller based on LADRC.

Sci Rep

January 2025

The Key Laboratory for Agricultural Machinery Intelligent Control and Manufacturing of Fujian Education Institutions, Wuyi University, Nanping, 354300, Fujian, China.

This paper proposes an adaptive real-time tillage depth control system for electric rotary tillers, based on Linear Active Disturbance Rejection Control (LADRC), to improve tillage depth accuracy in tea garden intercropping with soybeans. The tillage depth control system comprises a body posture sensor, a control unit, and a hybrid stepper motor, integrating sensor data to drive the motor and achieve precise depth control. Real-time displacement sensor signals are compared with target values, enabling closed-loop control of the rotary tiller.

View Article and Find Full Text PDF

In the face of forest fire emergencies, fast and efficient dispatching of rescue vehicles is an important means of mitigating the damage caused by forest fires, and is an effective method of avoiding secondary damage caused by forest fires, minimizing the damage caused by forest fires to the ecosystem, and mitigating the losses caused by economic development. this paper takes the actual problem as the starting point, constructs a reasonable mathematical model of the problem, for the special characteristics of the emergency rescue vehicle scheduling problem of forest fires, taking into account the actual road conditions in the northern pristine forest area, through the analysis of the cost of paths between the forest area and the highway, to obtain the least obstructed rescue paths, to narrow the gap between the theoretical model and the problem of the actual. Improvement of ordinary genetic algorithm, design of double population strategy selection operation, the introduction of chaotic search initialization population, to improve the algorithm's solution efficiency and accuracy, through the northern pristine forest area of Daxing'anling real forest fire cases and generation of large-scale random fire point simulation experimental test to verify the effectiveness of the algorithm, to ensure that the effectiveness and reasonableness of the solution to the problem of forest fire emergency rescue vehicle scheduling program.

View Article and Find Full Text PDF

Interpretable flash flood susceptibility mapping in Yarlung Tsangpo River Basin using H2O Auto-ML.

Sci Rep

January 2025

Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Sciences and Natural Resources Research (IGSNRR), Chinese Academy of Sciences (CAS), Beijing, 100101, China.

Flash flood susceptibility mapping is essential for identifying areas prone to flooding events and aiding decision-makers in formulating effective prevention measures. This study aims to evaluate the flash flood susceptibility in the Yarlung Tsangpo River Basin (YTRB) using multiple machine learning (ML) models facilitated by the H2O automated ML platform. The best-performing model was used to generate a flash flood susceptibility map, and its interpretability was analyzed using the Shapley Additive Explanations (SHAP) tree interpretation method.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!