Specific amino acid (AA) binding by aminoacyl-tRNA synthetases (aaRSs) is necessary for correct translation of the genetic code. Sequence and structure analyses have revealed the main specificity determinants and allowed a partitioning of aaRSs into two classes and several subclasses. However, the information contributed by each determinant has not been precisely quantified, and other, minor determinants may still be unidentified. Growth of genomic data and development of machine learning classification methods allow us to revisit these questions. This work considered the subclass IIb, formed by the three enzymes aspartyl-, asparaginyl-, and lysyl-tRNA synthetase (LysRS). Over 35,000 sequences from the Pfam database were considered, and used to train a machine-learning model based on ensembles of decision trees. The model was trained to reproduce the existing classification of each sequence as AspRS, AsnRS, or LysRS, and to identify which sequence positions were most important for the classification. A few positions (5-8 depending on the AA substrate) sufficed for accurate classification. Most but not all of them were well-known specificity determinants. The machine learning models thus identified sets of mutations that distinguish the three subclass members, which might be targeted in engineering efforts to alter or swap the AA specificities for biotechnology applications.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jmgm.2024.108818 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!