Custom tokenization dictionary (CUSTODI) is introduced as a novel way for tackling the problem of molecular representations, and especially the challenge of molecular property prediction. Herein, the motivational theory and the actual representation and model are presented and shown to have performance that is in line with benchmark methodologies. The uniqueness of CUSTODI is its applicability on small training sets and the developed theory suggests its possible use for a-priori estimation of future fit quality on any given dataset, regardless of the method used for fitting.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.1c00563DOI Listing

Publication Analysis

Top Keywords

custom tokenization
8
tokenization dictionary
8
dictionary custodi
8
custodi general
4
general fast
4
fast reversible
4
reversible data-driven
4
data-driven representation
4
representation regressor
4
regressor custom
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!