AI Article Synopsis

Article Abstract

Background: Existing bacterial culture test results for infectious diseases are written in unrefined text, resulting in many problems, including typographical errors and stop words. Effective spelling correction processes are needed to ensure the accuracy and reliability of data for the study of infectious diseases, including medical terminology extraction. If a dictionary is established, spelling algorithms using edit distance are efficient. However, in the absence of a dictionary, traditional spelling correction algorithms that utilize only edit distances have limitations.

Objective: In this research, we proposed a similarity-based spelling correction algorithm using pretrained word embedding with the BioWordVec technique. This method uses a character-level N-grams-based distributed representation through unsupervised learning rather than the existing rule-based method. In other words, we propose a framework that detects and corrects typographical errors when a dictionary is not in place.

Methods: For detected typographical errors not mapped to Systematized Nomenclature of Medicine (SNOMED) clinical terms, a correction candidate group with high similarity considering the edit distance was generated using pretrained word embedding from the clinical database. From the embedding matrix in which the vocabulary is arranged in descending order according to frequency, a grid search was used to search for candidate groups of similar words. Thereafter, the correction candidate words were ranked in consideration of the frequency of the words, and the typographical errors were finally corrected according to the ranking.

Results: Bacterial identification words were extracted from 27,544 bacterial culture and antimicrobial susceptibility reports, and 16 types of spelling errors and 914 misspelled words were found. The similarity-based spelling correction algorithm using BioWordVec proposed in this research corrected 12 types of typographical errors and showed very high performance in correcting 97.48% (based on F1 score) of all spelling errors.

Conclusions: This tool corrected spelling errors effectively in the absence of a dictionary based on bacterial identification words in bacterial culture and antimicrobial susceptibility reports. This method will help build a high-quality refined database of vast text data for electronic health records.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7939936PMC
http://dx.doi.org/10.2196/25530DOI Listing

Publication Analysis

Top Keywords

spelling correction
20
typographical errors
20
bacterial culture
16
culture antimicrobial
12
antimicrobial susceptibility
12
susceptibility reports
12
spelling
9
infectious diseases
8
edit distance
8
absence dictionary
8

Similar Publications

Many aspects of human performance require producing sequences of items in serial order. The current study takes a multiple-case approach to investigate whether the system responsible for serial order is shared across cognitive domains, focusing on working memory (WM) and word production. Serial order performance in three individuals with post-stroke language and verbal WM disorders (hereafter persons with aphasia, PWAs) were assessed using recognition and recall tasks for verbal and visuospatial WM, as well as error analyses in spoken and written production tasks to assess whether there was a tendency to produce the correct phonemes/letters in the wrong order.

View Article and Find Full Text PDF

Background And Objectives: Anthracycline chemotherapy is a cornerstone in pediatric oncology but carries a significant risk of cardiotoxicity. The early detection of cardiac dysfunction is crucial for timely intervention. This study aims to evaluate the predictive value of combining speckle tracking echocardiography (STE) parameters with traditional cardiac biomarkers for the early detection of anthracycline-induced cardiotoxicity in pediatric oncology patients.

View Article and Find Full Text PDF

Purpose: Information and communication technologies are crucial for social and professional integration, but access to technology can be difficult for people with physical impairments. Text entry can be slow and tiring. We developed a free and open-source module called for use with AAC (augmentative/alternative communication) software in French language.

View Article and Find Full Text PDF

Introduction: Tracheomalacia (TM) often occurs in children with oesophageal atresia (OA), leading to recurrent respiratory symptoms and in severe cases to blue spells or ultimately respiratory arrest. In some patients, a secondary posterior tracheopexy may then be indicated. This secondary surgery, as well as respiratory morbidity, may be prevented by performing a primary posterior tracheopexy (PPT) concurrent with primary OA correction.

View Article and Find Full Text PDF

A new species of the Neotropical genus Andrade (Hemiptera, Membracidae), with a new country record for the genus.

Zookeys

November 2024

Systematic Entomology Laboratory, Agricultural Research Service, U.S. Department of Agriculture, c/o National Museum of Natural History, P.O. Box 37012, Washington, D.C. 20013, USA National Museum of Natural History Washington United States of America.

is . , , from Bolivia and French Guiana, closely resembles Andrade in being brightly colored but differs in the metathoracic tibial chaetotaxy, the male pygofer, first anal segment, aedeagus, and color pattern. In this new species, which is larger than , females are larger than males.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!