Similarity corpus on microbial transcriptional regulation.

J Biomed Semantics

Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100, México.

Published: May 2019

Background: The ability to express the same meaning in different ways is a well-known property of natural language. This amazing property is the source of major difficulties in natural language processing. Given the constant increase in published literature, its curation and information extraction would strongly benefit from efficient automatic processes, for which corpora of sentences evaluated by experts are a valuable resource.

Results: Given our interest in applying such approaches to the benefit of curation of the biomedical literature, specifically that about gene regulation in microbial organisms, we decided to build a corpus with graded textual similarity evaluated by curators and that was designed specifically oriented to our purposes. Based on the predefined statistical power of future analyses, we defined features of the design, including sampling, selection criteria, balance, and size, among others. A non-fully crossed study design was applied. Each pair of sentences was evaluated by 3 annotators from a total of 7; the scale used in the semantic similarity assessment task within the Semantic Evaluation workshop (SEMEVAL) was adapted to our goals in four successive iterative sessions with clear improvements in the agreed guidelines and interrater reliability results. Alternatives for such a corpus evaluation have been widely discussed.

Conclusions: To the best of our knowledge, this is the first similarity corpus-a dataset of pairs of sentences for which human experts rate the semantic similarity of each pair-in this domain of knowledge. We have initiated its incorporation in our research towards high-throughput curation strategies based on natural language processing.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6532127PMC
http://dx.doi.org/10.1186/s13326-019-0200-xDOI Listing

Publication Analysis

Top Keywords

natural language
12
language processing
8
sentences evaluated
8
semantic similarity
8
similarity
5
similarity corpus
4
corpus microbial
4
microbial transcriptional
4
transcriptional regulation
4
regulation background
4

Similar Publications

Cost Effectiveness of Colorectal Cancer Screening Strategies in Middle- and High-Income Countries: A Systematic Review.

J Gastroenterol Hepatol

January 2025

Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi, China.

Background And Aim: Colorectal cancer (CRC) is a significant global health burden, and screening can greatly reduce CRC incidence and mortality. Previous studies investigated the economic effects of CRC screening. We performed a systematic review to provide the cost-effectiveness of CRC screening strategies across countries with different income levels.

View Article and Find Full Text PDF

Background And Objective: Sarcopenia, characterized by the progressive loss of skeletal muscle mass (MM) and muscle function, is a common and debilitating condition in cancer patients, significantly impacting their quality of life, treatment outcomes, and overall survival. The pathophysiology of sarcopenia is multifactorial, involving metabolic, hormonal, and inflammatory changes. Recent research highlights the role of chronic inflammation in the development and progression of sarcopenia, with pro-inflammatory cytokines being key mediators of muscle catabolism.

View Article and Find Full Text PDF

Zebrafish serve as a pivotal model for bioimaging and toxicity assessments; however, the toxicity of banana peel-derived carbon dots in zebrafish has not been previously reported. The aim of this study was to assess the toxicity of carbon dots derived from banana peel in zebrafish, focusing on two types prepared through hydrothermal and pyrolysis methods. Banana peels were synthesized using hydrothermal and pyrolysis techniques and then compared for characteristics, bioimaging ability, and toxicity in zebrafish as an animal model.

View Article and Find Full Text PDF

Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba.

View Article and Find Full Text PDF

Context-dependent similarity analysis of analogue series for structure-activity relationship transfer based on a concept from natural language processing.

J Cheminform

January 2025

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.

Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure-activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!