Publications by authors named "Anastasiya V Kulikova"

Unlabelled: Class II microcins are antimicrobial peptides that have shown some potential as novel antibiotics. However, to date, only 10 class II microcins have been described, and the discovery of novel microcins has been hampered by their short length and high sequence divergence. Here, we ask if we can use numerical embeddings generated by protein large language models to detect microcins in bacterial genome assemblies and whether this method can outperform sequence-based methods such as BLAST.

View Article and Find Full Text PDF

Class II microcins are antimicrobial peptides that have shown some potential as novel antibiotics. However, to date only ten class II microcins have been described, and discovery of novel microcins has been hampered by their short length and high sequence divergence. Here, we ask if we can use numerical embeddings generated by protein large language models to detect microcins in bacterial genome assemblies and whether this method can outperform sequence-based methods such as BLAST.

View Article and Find Full Text PDF

Deep learning models are seeing increased use as methods to predict mutational effects or allowed mutations in proteins. The models commonly used for these purposes include large language models (LLMs) and 3D Convolutional Neural Networks (CNNs). These two model types have very different architectures and are commonly trained on different representations of proteins.

View Article and Find Full Text PDF

Deep learning models are seeing increased use as methods to predict mutational effects or allowed mutations in proteins. The models commonly used for these purposes include large language models (LLMs) and 3D Convolutional Neural Networks (CNNs). These two model types have very different architectures and are commonly trained on different representations of proteins.

View Article and Find Full Text PDF
Article Synopsis
  • Machine and deep learning techniques are utilizing large datasets of protein sequences and their effects to better predict genetic variants that enhance fitness.
  • Despite the variety of machine learning methods being developed, the key factor for success is the quality and availability of training data rather than just the algorithms themselves.
  • Advancements in unsupervised and self-supervised learning frameworks show promise for situations with limited data, suggesting that machine learning will significantly advance protein biochemistry and engineering in the future.
View Article and Find Full Text PDF

One fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding site of interest.

View Article and Find Full Text PDF