Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining. For this reason, the proposal of sentence similarity methods for the biomedical domain has attracted a lot of attention in recent years. However, most sentence similarity methods and experimental results reported in the biomedical domain cannot be reproduced for multiple reasons as follows: the copying of previous results without confirmation, the lack of source code and data to replicate both methods and experiments, and the lack of a detailed definition of the experimental setup, among others. As a consequence of this reproducibility gap, the state of the problem can be neither elucidated nor new lines of research be soundly set. On the other hand, there are other significant gaps in the literature on biomedical sentence similarity as follows: (1) the evaluation of several unexplored sentence similarity methods which deserve to be studied; (2) the evaluation of an unexplored benchmark on biomedical sentence similarity, called Corpus-Transcriptional-Regulation (CTR); (3) a study on the impact of the pre-processing stage and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (4) the lack of software and data resources for the reproducibility of methods and experiments in this line of research. Identified these open problems, this registered report introduces a detailed experimental setup, together with a categorization of the literature, to develop the largest, updated, and for the first time, reproducible experimental survey on biomedical sentence similarity. Our aforementioned experimental survey will be based on our own software replication and the evaluation of all methods being studied on the same software platform, which will be specially developed for this work, and it will become the first publicly available software library for biomedical sentence similarity. Finally, we will provide a very detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7990182 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0248663 | PLOS |
J Speech Lang Hear Res
January 2025
Department of Communication Science and Disorders, University of Pittsburgh, PA.
Purpose: The present study assessed the test-retest reliability of the American Sign Language (ASL) version of the Computerized Revised Token Test (CRTT-ASL) and compared the differences and similarities between ASL and English reading by Deaf and hearing users of ASL.
Method: Creation of the CRTT-ASL involved filming, editing, and validating CRTT instructions, sentence commands, and scoring. Deaf proficient (DP), hearing nonproficient (HNP), and hearing proficient sign language users completed the CRTT-ASL and the English self-paced, word-by-word reading CRTT (CRTT-Reading-Word Fade [CRTT-R-wf]).
Br J Gen Pract
January 2025
University Medical Centre Utrecht, Department of General Practice & Nursing Sciences, Julius Center for Health Sciences and Primary Care, Utrecht, Netherlands.
Aim: To develop and internally validate a model predicting life-threatening events for out-of-hours primary care callers with shortness of breath.
Method: This cross-sectional study includes data from 1,952 patients with shortness of breath who called out-of-hours primary care between September 2020 and August 2021. Four logistic regression models were developed with life-threatening events as the outcome.
MethodsX
June 2025
Computer Science Department, Information Technology University of Punjab, Lahore, Pakistan.
Optical character recognition (OCR) is vital in digitizing printed data into a digital format, which can be conveniently used for various purposes. A significant amount of work has been done in OCR for well-resourced languages like English. However, languages like Urdu, spoken by a large community, face limitations in OCR due to a lack of resources and the complexity and diversity of handwritten scripts.
View Article and Find Full Text PDFJMIR Form Res
January 2025
School of Health Studies, Northern Illinois University, DeKalb, IL, United States.
Background: About 53 million adults in the United States offer informal care to family and friends with disease or disability. Such care has an estimated economic value of US $600 million. Most informal caregivers are not paid nor trained in caregiving, with many experiencing higher-than-average levels of stress and depression and lower levels of physical health.
View Article and Find Full Text PDFBehav Res Methods
January 2025
Department of Sport and Health Sciences, University of Potsdam, Potsdam, Germany.
We introduce a sentence corpus with eye-movement data in traditional Chinese (TC), based on the original Beijing Sentence Corpus (BSC) in simplified Chinese (SC). The most noticeable difference between TC and SC character sets is their visual complexity. There are reaction time corpora in isolated TC character/word lexical decision and naming tasks.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!