Similarity corpus on microbial transcriptional regulation.

Oscar Lithgow-Serrano Socorro Gama-Castro Cecilia Ishida-Gutiérrez Citlalli Mejía-Almonte Víctor H Tierrafría Sara Martínez-Luna Alberto Santos-Zavaleta David Velázquez-Ramírez Julio Collado-Vides

J Biomed Semantics

Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100, México.

Published: May 2019

Background: The ability to express the same meaning in different ways is a well-known property of natural language. This amazing property is the source of major difficulties in natural language processing. Given the constant increase in published literature, its curation and information extraction would strongly benefit from efficient automatic processes, for which corpora of sentences evaluated by experts are a valuable resource.

Results: Given our interest in applying such approaches to the benefit of curation of the biomedical literature, specifically that about gene regulation in microbial organisms, we decided to build a corpus with graded textual similarity evaluated by curators and that was designed specifically oriented to our purposes. Based on the predefined statistical power of future analyses, we defined features of the design, including sampling, selection criteria, balance, and size, among others. A non-fully crossed study design was applied. Each pair of sentences was evaluated by 3 annotators from a total of 7; the scale used in the semantic similarity assessment task within the Semantic Evaluation workshop (SEMEVAL) was adapted to our goals in four successive iterative sessions with clear improvements in the agreed guidelines and interrater reliability results. Alternatives for such a corpus evaluation have been widely discussed.

Conclusions: To the best of our knowledge, this is the first similarity corpus-a dataset of pairs of sentences for which human experts rate the semantic similarity of each pair-in this domain of knowledge. We have initiated its incorporation in our research towards high-throughput curation strategies based on natural language processing.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6532127	PMC
http://dx.doi.org/10.1186/s13326-019-0200-x	DOI Listing

Publication Analysis

Top Keywords

natural language

language processing

sentences evaluated

semantic similarity

similarity

similarity corpus

corpus microbial

microbial transcriptional

transcriptional regulation

regulation background

Similar Publications

Cost Effectiveness of Colorectal Cancer Screening Strategies in Middle- and High-Income Countries: A Systematic Review.

J Gastroenterol Hepatol

January 2025

Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi, China.

Yuxuan Li Ruyi Xia Wenwen Si Wendi Zhang Yunbo Zhang

Background And Aim: Colorectal cancer (CRC) is a significant global health burden, and screening can greatly reduce CRC incidence and mortality. Previous studies investigated the economic effects of CRC screening. We performed a systematic review to provide the cost-effectiveness of CRC screening strategies across countries with different income levels.

View Article and Find Full Text PDF

Similar Publications

Exploring the association between pro-inflammatory mediators and sarcopenia in cancer patients through different diagnostic tools: a narrative review.

Ann Transl Med

December 2024

Post-Graduation Department, Faculty of Medical Sciences of Minas Gerais, Belo Horizonte, Brazil.

Juliana Aparecida Braga Cruz Leani Souza Maximo Pereira Daniel Steffens Ariane Vieira Carvalho Ana Paula Drummond-Lage

Background And Objective: Sarcopenia, characterized by the progressive loss of skeletal muscle mass (MM) and muscle function, is a common and debilitating condition in cancer patients, significantly impacting their quality of life, treatment outcomes, and overall survival. The pathophysiology of sarcopenia is multifactorial, involving metabolic, hormonal, and inflammatory changes. Recent research highlights the role of chronic inflammation in the development and progression of sarcopenia, with pro-inflammatory cytokines being key mediators of muscle catabolism.

View Article and Find Full Text PDF

Similar Publications

Toxicity assessment and bioimaging potential of carbon dots synthesized from banana peel in zebrafish model.

Narra J

December 2024

Research Group of Pharmaceutics, School of Pharmacy, Institut Teknologi Bandung, Bandung, Indonesia.

Ni Pad Wijayanti Fitri A Permatasari Sophi Damayanti Kusnandar Anggadiredja Fery Iskandar

Zebrafish serve as a pivotal model for bioimaging and toxicity assessments; however, the toxicity of banana peel-derived carbon dots in zebrafish has not been previously reported. The aim of this study was to assess the toxicity of carbon dots derived from banana peel in zebrafish, focusing on two types prepared through hydrothermal and pyrolysis methods. Banana peels were synthesized using hydrothermal and pyrolysis techniques and then compared for characteristics, bioimaging ability, and toxicity in zebrafish as an animal model.

View Article and Find Full Text PDF

Similar Publications

AxLaM: energy-efficient accelerator design for language models for edge computing.

Philos Trans A Math Phys Eng Sci

January 2025

Indian Institute of Technology Gandhinagar, Gandhinagar, Gujarat, India.

Tom Glint Bhumika Mittal Santripta Sharma Abdul Qadir Ronak Abhinav Goud

Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba.

View Article and Find Full Text PDF

Similar Publications

Context-dependent similarity analysis of analogue series for structure-activity relationship transfer based on a concept from natural language processing.

J Cheminform

January 2025

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.

Atsushi Yoshimori Jürgen Bajorath

Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure-activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!