The accurate prediction of protein family from amino acid sequence by measuring features of sequence fragments.

Huixiao Hong Qilong Hong Roger Perkins Leming Shi Hong Fang Zhenqiang Su Yvonne Dragan James C Fuscoe Weida Tong

J Comput Biol

Division of Systems Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas 72079, USA.

Published: December 2009

The rapid advances in proteomic analyses coupled with the completion of multiple genomes have led to an increased demand for determining protein functions. The first step is classification or prediction into families. A method was developed for the prediction of protein family based only on protein sequence using support vector machine (SVM) models. In these models, the amino acids were classified into three categories (apolar, polar, and charged). Consecutive fragments ranging from one to five were annotated by amino acid type to define the protein features of each protein. SVM models were constructed based on the protein features of a training set of proteins and then examined with an independent set of proteins. The approach was tested for 20 protein families from the iProClass database of Protein Information Resources (PIR). For two-class SVM models, an average prediction accuracy of 0.9985 was achieved, while for multi-class SVM models an accuracy of 0.9941 was achieved. This study demonstrates that SVM based methods can accurately recognize and predict the protein family to which a sequence belongs based solely on its primary amino acid sequence.

Download full-text PDF	Source
http://dx.doi.org/10.1089/cmb.2008.0115	DOI Listing

Publication Analysis

Top Keywords

svm models

protein family

amino acid

protein

prediction protein

acid sequence

based protein

protein features

set proteins

sequence

Similar Publications

Transformer-Based Tool for Automated Fact-Checking: A Pilot Study on Online Health Information.

JMIR Infodemiology

December 2024

Department of Management, Evaluation and Health Policy, School of Public Health, Université de Montréal, Montreal, CA.

Azadeh Bayani Alexandre Ayotte Jean Noel Nikiema

Background: Many people seek health-related information online. The significance of reliable information became particularly evident due to the potential dangers of misinformation. Therefore, discerning true and reliable information from false information has become increasingly challenging.

View Article and Find Full Text PDF

Similar Publications

Machine-Learning Based Computed Tomography Radiomics Nomgram For Predicting Perineural Invasion In Gastric Cancer.

Curr Med Imaging

January 2025

Department of Radiology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, China.

Pei Huang Sheng Li Zhikang Deng Fangfang Hu Di Jin

Objective: The aim of this study was to develop and validate predictive models for perineural invasion (PNI) in gastric cancer (GC) using clinical factors and radiomics features derived from contrast-enhanced computed tomography (CE-CT) scans and to compare the performance of these models.

Methods: This study included 205 GC patients, who were randomly divided into a training set (n=143) and a validation set (n=62) in a 7:3 ratio. Optimal radiomics features were selected using the least absolute shrinkage and selection operator (LASSO) algorithm.

View Article and Find Full Text PDF

Similar Publications

Artificial Intelligence in Predicting Postpartum Hemorrhage in Twin Pregnancies Undergoing Cesarean Section.

Twin Res Hum Genet

January 2025

Necmettin Erbakan University Medical School of Meram, Department of Obstetrics and Gynecology, Division of Fetal and Maternal Medicine, Konya, Turkey.

Sukran Dogru Huriye Ezveci Fatih Akkus Pelin Bahçeci Fikriye Karanfil Yaman

This study aimed to create a risk prediction model with artificial intelligence (AI) to identify patients at higher risk of postpartum hemorrhage using perinatal characteristics that may be associated with later postpartum hemorrhage (PPH) in twin pregnancies that underwent cesarean section. The study was planned as a retrospective cohort study at University Hospital. All twin cesarean deliveries were categorized into two groups: those with and without PPH.

View Article and Find Full Text PDF

Similar Publications

Differentiation between multiple sclerosis and neuromyelitis optic spectrum disorders with multilevel fMRI features: A machine learning analysis.

Sci Rep

January 2025

Department of Radiology, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, 330006, Jiangxi, China.

Xiao Liang Qingwen Zeng Yanyan Zhu Yao Wang Ting He

The conventional statistical approach for analyzing resting state functional MRI (rs-fMRI) data struggles to accurately distinguish between patients with multiple sclerosis (MS) and those with neuromyelitis optic spectrum disorders (NMOSD), highlighting the need for improved diagnostic efficacy. In this study, multilevel functional metrics including resting state functional connectivity, amplitude of low frequency fluctuation (ALFF), and regional homogeneity (ReHo) were calculated and extracted from 116 regions of interest in the anatomical automatic labeling atlas. Subsequently, classifiers were developed using different combinations of these selected features to distinguish between MS and NMOSD.

View Article and Find Full Text PDF

Similar Publications

Identify characteristics of Vietnamese oral squamous cell carcinoma patients by machine learning on transcriptome and clinical-histopathological analysis.

J Dent Sci

December 2024

Blood Transfusion Haematology Hospital No. 2, Ho Chi Minh City, Viet Nam.

Huong Thu Duong Nam Cong-Nhat Huynh Chi Thi-Kim Nguyen Linh Gia-Hoang Le Khoa Dang Nguyen

Background/purpose: Oral squamous cell carcinoma (OSCC) is notorious for its low survival rates, due to the advanced stage at which it is commonly diagnosed. To enhance early detection and improve prognostic assessments, our study harnesses the power of machine learning (ML) to dissect and interpret complex patterns within mRNA-sequencing (RNA-seq) data and clinical-histopathological features.

Materials And Methods: 206 retrospective Vietnamese OSCC formalin-fixed paraffin-embedded (FFPE) tumor samples, of which 101 were subjected to RNA-seq for classification based on gene expression.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!