The study aims to create a dataset that reflects the functionality of small molecules by leveraging chemical literature, rather than just relying on traditional structure-based methods.
They introduced the Chemical Function (CheF) dataset, which contains 631,000 molecule-function pairs derived from patents using advanced AI techniques, capturing a variety of chemical functions.
Analyses show that this dataset effectively represents the chemical function landscape, allowing researchers to identify drug candidates based on predicted functional profiles, thereby offering a new approach to molecular discovery.