In part two of this mini-series, we evaluate the range of machine-learning tools now available for application to veterinary clinical text-mining. These tools will be vital to automate extraction of information from large datasets of veterinary clinical narratives curated by projects such as the Small Animal Veterinary Surveillance Network (SAVSNET) and VetCompass, where volumes of millions of records preclude reading records and the complexities of clinical notes limit usefulness of more "traditional" text-mining approaches. We discuss the application of various machine learning techniques ranging from simple models for identifying words and phrases with similar meanings to expand lexicons for keyword searching, to the use of more complex language models.
View Article and Find Full Text PDFThe development of natural language processing techniques for deriving useful information from unstructured clinical narratives is a fast-paced and rapidly evolving area of machine learning research. Large volumes of veterinary clinical narratives now exist curated by projects such as the Small Animal Veterinary Surveillance Network (SAVSNET) and VetCompass, and the application of such techniques to these datasets is already (and will continue to) improve our understanding of disease and disease patterns within veterinary medicine. In part one of this two part article series, we discuss the importance of understanding the lexical structure of clinical records and discuss the use of basic tools for filtering records based on key words and more complex rule based pattern matching approaches.
View Article and Find Full Text PDFBackground: How to treat a disease remains to be the most common type of clinical question. Obtaining evidence-based answers from biomedical literature is difficult. Analogical reasoning with embeddings from deep learning (embedding analogies) may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional (pairwise) analogies such as man:woman::king:queen ("queen = -man +king +woman").
View Article and Find Full Text PDFBackground: Deep Learning opens up opportunities for routinely scanning large bodies of biomedical literature and clinical narratives to represent the meaning of biomedical and clinical terms. However, the validation and integration of this knowledge on a scale requires cross checking with ground truths (i.e.
View Article and Find Full Text PDFBackground: Automatic identification of term variants or acceptable alternative free-text terms for gene and protein names from the millions of biomedical publications is a challenging task. Ontologies, such as the Cardiovascular Disease Ontology (CVDO), capture domain knowledge in a computational form and can provide context for gene/protein names as written in the literature. This study investigates: 1) if word embeddings from Deep Learning algorithms can provide a list of term variants for a given gene/protein of interest; and 2) if biological knowledge from the CVDO can improve such a list without modifying the word embeddings created.
View Article and Find Full Text PDFWe investigate the application of distributional semantics models for facilitating unsupervised extraction of biomedical terms from unannotated corpora. Term extraction is used as the first step of an ontology learning process that aims to (semi-)automatic annotation of biomedical concepts and relations from more than 300K PubMed titles and abstracts. We experimented with both traditional distributional semantics methods such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) as well as the neural language models CBOW and Skip-gram from Deep Learning.
View Article and Find Full Text PDFClinical proteomics has led to the identification of a substantial number of disease-associated peptides and protein fragments in several conditions such as cancer, kidney, or cardiovascular diseases. In silico prediction tools that can facilitate linking of identified peptide biomarkers to predicted protease activity might therefore significantly contribute to the understanding of pathophysiological mechanisms of these diseases. Proteasix is an open-source, peptide-centric tool that can be used to predict in silico the proteases involved in naturally occurring peptide generation.
View Article and Find Full Text PDFUnlabelled: Extracting information from peptidomics data is a major current challenge, as endogenous peptides can result from the activity of multiple enzymes. Proteolytic enzymes can display overlapping or complementary specificity. The activity spectrum of human endogenous peptide-generating proteases is not fully known.
View Article and Find Full Text PDFBackground: The Proteasix Ontology (PxO) is an ontology that supports the Proteasix tool; an open-source peptide-centric tool that can be used to predict automatically and in a large-scale fashion in silico the proteases involved in the generation of proteolytic cleavage fragments (peptides)
Methods: The PxO re-uses parts of the Protein Ontology, the three Gene Ontology sub-ontologies, the Chemical Entities of Biological Interest Ontology, the Sequence Ontology and bespoke extensions to the PxO in support of a series of roles: 1. To describe the known proteases and their target cleaveage sites. 2.
Background: Patients with multiple conditions have complex needs and are increasing in number as populations age. This multimorbidity is one of the greatest challenges facing health care. Having more than 1 condition generates (1) interactions between pathologies, (2) duplication of tests, (3) difficulties in adhering to often conflicting clinical practice guidelines, (4) obstacles in the continuity of care, (5) confusing self-management information, and (6) medication errors.
View Article and Find Full Text PDF