FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model.

PLoS One

National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China.

Published: February 2013

Single amino acid variants (SAVs) are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with certain disease. In this work, we constructed a high-quality structural dataset that contained 679 high-quality protein structures with 2,048 SAVs by collecting the human genetic variant data from multiple resources and dividing them into two categories, i.e., disease-associated and neutral variants. We built a two-stage random forest (RF) model, termed as FunSAV, to predict the functional effect of SAVs by combining sequence, structure and residue-contact network features with other additional features that were not explored in previous studies. Importantly, a two-step feature selection procedure was proposed to select the most important and informative features that contribute to the prediction of disease association of SAVs. In cross-validation experiments on the benchmark dataset, FunSAV achieved a good prediction performance with the area under the curve (AUC) of 0.882, which is competitive with and in some cases better than other existing tools including SIFT, SNAP, Polyphen2, PANTHER, nsSNPAnalyzer and PhD-SNP. The sourcecodes of FunSAV and the datasets can be downloaded at http://sunflower.kuicr.kyoto-u.ac.jp/sjn/FunSAV.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3427247PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0043847PLOS

Publication Analysis

Top Keywords

single amino
8
amino acid
8
acid variants
8
two-stage random
8
random forest
8
forest model
8
savs
5
funsav
4
funsav predicting
4
predicting functional
4

Similar Publications

Purpose: Heterozygous mutations in the insulin gene can give rise to a monogenic diabetes syndrome due to toxic misfolding of the variant proinsulin in the endoplasmic reticulum (ER) of pancreatic β-cells. Clinical mutations are widely distributed in the sequence (86 amino acids). Misfolding induces chronic ER stress and interferes in with wildtype biosynthesis and secretion.

View Article and Find Full Text PDF

This study proposes fluorenylmethoxycarbonyl (Fmoc)-protected single amino acids (Fmoc-AAs) as a minimalistic model system to investigate liquid-liquid phase separation (LLPS) and the elusive liquid-to-solid transition of condensates. We demonstrated that Fmoc-AAs exhibit LLPS depending on the pH and ionic strength, primarily driven by hydrophobic interactions. Systematic examination of the conditions under which each Fmoc-AA undergoes LLPS revealed distinct residue-dependent trends in the critical concentrations and phase behavior.

View Article and Find Full Text PDF

The Novel HLA-B*37:114 Allele Identified by Next-Generation Sequencing in a Chinese Individual.

HLA

January 2025

Department of Transfusion, The First Affiliated Hospital of Nanjing Medical University, Jiangsu Province Hospital, Nanjing, China.

HLA-B*37:114 has a single non-synonymous change from HLA-B*37:01:01:01 changing residue 163 from Threonine to Lysine'.

View Article and Find Full Text PDF

Target cyclooxygenase 2 (COX-2) and 5-lipoxygenase (5-LOX) inhibitors; 5-([2,5-Dihydroxybenzyl]amino)salicylamides (Compounds 1-11) were examined for potential anticancer activity, with a trial to assess the underlying possible mechanisms. Compounds were assessed at a single dose against 60 cancer cell lines panel and those with the highest activity were tested in the five-dose assay. COMPARE analysis was conducted to explore potential mechanisms underlying their biological activity.

View Article and Find Full Text PDF

PNPLA3-I148M genotype is the strongest predictive single-nucleotide polymorphism for liver fat. We examine whether PNPLA3-I148M modifies associations between oxidative gaseous air pollutant exposure (O) with i) liver fat and ii) multi-omics profiles of miRNAs and metabolites linked to liver fat. Participants were 69 young adults (17-22 years) from the Meta-AIR cohort.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!