Severity: Warning
Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 176
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML
File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 316
Function: require_once
With machine learning now transforming the sciences, successful prediction of biological structure or activity is mainly limited by the extent and quality of data available for training, the astute choice of features for prediction, and thorough assessment of the robustness of prediction on a variety of new cases. In this chapter, we address these issues while developing and sharing protocols to build a robust dataset and rigorously compare several predictive classifiers using the open-source Python machine learning library, scikit-learn. We show how to evaluate whether enough data has been used for training and whether the classifier has been overfit to training data. The most telling experiment is 500-fold repartitioning of the training and test sets, followed by prediction, which gives a good indication of whether a classifier performs consistently well on different datasets. An intuitive method is used to quantify which features are most important for correct prediction.The resulting well-trained classifier, hotspotter, can robustly predict the small subset of amino acid residues on the surface of a protein that are energetically most important for binding a protein partner: the interaction hot spots. Hotspotter has been trained and tested here on a curated dataset assembled from 1046 non-redundant alanine scanning mutation sites with experimentally measured change in binding free energy values from 97 different protein complexes; this dataset is available to download. The accessible surface area of the wild-type residue at a given site and its degree of evolutionary conservation proved the most important features to identify hot spots. A variant classifier was trained and validated for proteins where only the amino acid sequence is available, augmented by secondary structure assignment. This version of hotspotter requiring fewer features is almost as robust as the structure-based classifier. Application to the ACE2 (angiotensin converting enzyme 2) receptor, which mediates COVID-19 virus entry into human cells, identified the critical hot spot triad of ACE2 residues at the center of the small interface with the CoV-2 spike protein. Hotspotter results can be used to guide the strategic design of protein interfaces and ligands and also to identify likely interfacial residues for protein:protein docking.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/978-1-0716-3441-7_14 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!