The text mining of patents of pharmaceutical interest poses a number of unique challenges not encountered in other fields of text mining. Unlike fields, such as bioinformatics, where the number of terms of interest is enumerable and essentially static, systematic chemical nomenclature can describe an infinite number of molecules. Hence, the dictionary- and ontology-based techniques that are commonly used for gene names, diseases, species, etc.
View Article and Find Full Text PDFThe increase in drug research output from patent applications, together with the expansion of public data collections, such as ChEMBL and PubChem BioAssay, has made it essential for pharmaceutical companies to integrate both internal and external 'SAR estate'. The AstraZeneca response has been the development of an enterprise application, Chemistry Connect, containing 45 million unique chemical structures from 18 internal and external data sources. It includes merged compound-to-assay-to-result-to-target relationships extracted from patents, papers and internal data.
View Article and Find Full Text PDF