Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification.

Biotechnol Lett

Department of Biochemistry and Molecular Biology, College of Life Science, Nankai University, Weijin Road 94, Tianjin, 300071, China,

Published: October 2014

A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10529-014-1577-3DOI Listing

Publication Analysis

Top Keywords

genetic algorithm
8
thermophilic proteins
8
amino acids
8
non-thermophilic proteins
8
proteins method
8
proteins
5
optimal subset
4
subset selection
4
selection primary
4
primary sequence
4

Similar Publications

Background: Infant alertness and neurologic changes can reflect life-threatening pathology but are assessed by physical exam, which can be intermittent and subjective. Reliable, continuous methods are needed. We hypothesized that our computer vision method to track movement, pose artificial intelligence (AI), could predict neurologic changes in the neonatal intensive care unit (NICU).

View Article and Find Full Text PDF

Idiopathic pulmonary fibrosis (IPF) is a chronic interstitial lung disease with a poor prognosis. Its non-specific clinical symptoms make accurate prediction of disease progression challenging. This study aimed to develop molecular-level prognostic models to personalize treatment strategies for IPF patients.

View Article and Find Full Text PDF

DNA methylation age (DNAmAge) surpasses chronological age in its ability to predict age-related morbidities and mortality. This study analyzed data from 287 middle-aged twins in the Louisville Twin Study (mean age 51.9 years ± 7.

View Article and Find Full Text PDF

Unlabelled: The reflexive translation of symbols in one chemical language to another defined genetics. Yet, the co-linearity of codons and amino acids is so commonplace an idea that few even ask how it arose. Readout is done by two distinct sets of proteins, called aminoacyl-tRNA synthetases (AARS).

View Article and Find Full Text PDF

Background: Sepsis is an uncontrolled reaction to infection that causes severe organ dysfunction and is a primary cause of ARDS. Patients suffering both sepsis and ARDS have a poor prognosis and high mortality. However, the mechanisms behind their simultaneous occurrence are unclear.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!