Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining.
View Article and Find Full Text PDFMachine learning (ML) is a key technology for accurate prediction of antibody-antigen binding. Two orthogonal problems hinder the application of ML to antibody-specificity prediction and the benchmarking thereof: the lack of a unified ML formalization of immunological antibody-specificity prediction problems and the unavailability of large-scale synthetic datasets to benchmark real-world relevant ML methods and dataset design. Here we developed the Absolut! software suite that enables parameter-based unconstrained generation of synthetic lattice-based three-dimensional antibody-antigen-binding structures with ground-truth access to conformational paratope, epitope and affinity.
View Article and Find Full Text PDFAlthough the therapeutic efficacy and commercial success of monoclonal antibodies (mAbs) are tremendous, the design and discovery of new candidates remain a time and cost-intensive endeavor. In this regard, progress in the generation of data describing antigen binding and developability, computational methodology, and artificial intelligence may pave the way for a new era of on-demand immunotherapeutics design and discovery. Here, we argue that the main necessary machine learning (ML) components for an mAb sequence generator are: understanding of the rules of mAb-antigen binding, capacity to modularly combine mAb design parameters, and algorithms for unconstrained parameter-driven mAb sequence synthesis.
View Article and Find Full Text PDFThis paper shows how methods from statistical relational learning can be used to address problems in grammatical inference using model-theoretic representations of strings. These model-theoretic representations are the basis of representing formal languages logically. Conventional representations include a binary relation for order and unary relations describing mutually exclusive properties of each position in the string.
View Article and Find Full Text PDFThe cho-1 gene in Caenorhabditis elegans encodes a high-affinity plasma-membrane choline transporter believed to be rate limiting for acetylcholine (ACh) synthesis in cholinergic nerve terminals. We found that CHO-1 is expressed in most, but not all cholinergic neurons in C. elegans.
View Article and Find Full Text PDF