In Biomedical Named Entity Recognition (BioNER), the use of current cutting-edge deep learning-based methods, such as deep bidirectional transformers (e.g. BERT, GPT-3), can be substantially hampered by the absence of publicly accessible annotated datasets. When the BioNER system is required to annotate multiple entity types, various challenges arise because the majority of current publicly available datasets contain annotations for just one entity type: for example, mentions of disease entities may not be annotated in a dataset specialized in the recognition of drugs, resulting in a poor ground truth when using the two datasets to train a single multi-task model. In this work, we propose TaughtNet, a knowledge distillation-based framework allowing us to fine-tune a single multi-task student model by leveraging both the ground truth and the knowledge of single-task teachers. Our experiments on the recognition of mentions of diseases, chemical compounds and genes show the appropriateness and relevance of our approach w.r.t. strong state-of-the-art baselines in terms of precision, recall and F1 scores. Moreover, TaughtNet allows us to train smaller and lighter student models, which may be easier to be used in real-world scenarios, where they have to be deployed on limited-memory hardware devices and guarantee fast inferences, and shows a high potential to provide explainability. We publicly release both our code on github and our multi-task model on the huggingface repository..

Download full-text PDF

Source
http://dx.doi.org/10.1109/JBHI.2023.3244044DOI Listing

Publication Analysis

Top Keywords

biomedical named
8
named entity
8
entity recognition
8
single-task teachers
8
ground truth
8
single multi-task
8
multi-task model
8
taughtnet learning
4
multi-task
4
learning multi-task
4

Similar Publications

Over the past decade, epigenetic clocks have emerged as powerful machine learning tools, not only to estimate chronological and biological age but also to assess the efficacy of anti-ageing, cellular rejuvenation and disease-preventive interventions. However, many computational and statistical challenges remain that limit our understanding, interpretation and application of epigenetic clocks. Here, we review these computational challenges, focusing on interpretation, cell-type heterogeneity and emerging single-cell methods, aiming to provide guidelines for the rigorous construction of interpretable epigenetic clocks at cell-type and single-cell resolution.

View Article and Find Full Text PDF

Background: Malaria remains a substantial public health burden among young children in sub-Saharan Africa and a highly efficacious vaccine eliciting a durable immune response would be a useful tool for controlling malaria. R21 is a malaria vaccine comprising nanoparticles, formed from a circumsporozoite protein and hepatitis B surface antigen (HBsAg) fusion protein, without any unfused HBsAg, and is administered with the saponin-based Matrix-M adjuvant. This study aimed to assess the safety and immunogenicity of the malaria vaccine candidate, R21, administered with or without adjuvant Matrix-M in adults naïve to malaria infection and in healthy adults from malaria endemic areas.

View Article and Find Full Text PDF

Background: R21 is a novel malaria vaccine, composed of a fusion protein of the malaria circumsporozoite protein and hepatitis B surface antigen. Following favourable safety and immunogenicity in a phase 1 study, we aimed to assess the efficacy of R21 administered with Matrix-M (R21/MM) against clinical malaria in adults from the UK who were malaria naive in a controlled human malaria infection study.

Methods: In this open-label, partially blinded, phase 1-2A controlled human malaria infection study undertaken in Oxford, Southampton, and London, UK, we tested five novel vaccination regimens of R21/MM.

View Article and Find Full Text PDF

BRD4-targeted photodegradation nanoplatform for light activatable melanoma therapy.

Biomaterials

January 2025

State Key Laboratory of Advanced Medical Materials and Devices, Tianjin Key Laboratory of Biomedical Materials, Institute of Biomedical Engineering, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300192, China. Electronic address:

The targeted protein degradation (TPD) strategy modulates tumor growth pathways by degrading proteins of interest (POIs) and has reshaped anti-tumor drug research and development. Recently, the emergence of photodegradation-targeting chimeras (PDTACs) and laser irradiation at specific sites enables precise spatiotemporal controllability of TPD. Capitalizing on the advances of PDTACs, herein, we report a nanoplatform for efficiently delivering PDTAC molecule for photodegradation of bromodomain-containing protein 4 (BRD4) proteins, the key activators of oncogenic transcription.

View Article and Find Full Text PDF

Adenoviruses are a concern for pigeon breeders due to their impact on animal health. Furthermore, they have been studied for nearly five decades and are one of the most studied viruses in pigeons. However, the number of complete genomic sequences of pigeon-infecting adenoviruses available is very low, and the pathogenic effect of these viruses on pigeons is still yet to be thoroughly explored.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!