Improving the performance of dictionary-based approaches in protein name recognition.

J Biomed Inform

CREST, Japan Science and Technology (JST) Agency, Honcho 4-1-8, Kawaguchi-shi, Saitama 332-0012, Japan.

Published: December 2004

Dictionary-based protein name recognition is often a first step in extracting information from biomedical documents because it can provide ID information on recognized terms. However, dictionary-based approaches present two fundamental difficulties: (1) false recognition mainly caused by short names; (2) low recall due to spelling variations. In this paper, we tackle the former problem using machine learning to filter out false positives and present two alternative methods for alleviating the latter problem of spelling variations. The first is achieved by using approximate string searching, and the second by expanding the dictionary with a probabilistic variant generator, which we propose in this paper. Experimental results using the GENIA corpus revealed that filtering using a naive Bayes classifier greatly improved precision with only a slight loss of recall, resulting in 10.8% improvement in F-measure, and dictionary expansion with the variant generator gave further 1.6% improvement and achieved an F-measure of 66.6%.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2004.08.003DOI Listing

Publication Analysis

Top Keywords

dictionary-based approaches
8
protein recognition
8
spelling variations
8
variant generator
8
improving performance
4
performance dictionary-based
4
approaches protein
4
recognition dictionary-based
4
dictionary-based protein
4
recognition step
4

Similar Publications

5956 German affective norms for atmospheres in organizations (GANAiO).

Behav Res Methods

December 2024

Department of Business Administration and Economics, FernUniversität in Hagen, Hagen, Germany.

This article develops a comprehensive database comprising 5956 German affective norms specifically tailored for the study of organizational atmospheres through computational verbal language analysis. This dictionary adopts both dimensional and categorical approaches. The theoretical foundation of this study is the circumplex model of affective atmospheres.

View Article and Find Full Text PDF

Knowledge mining of brain connectivity in massive literature based on transfer learning.

Bioinformatics

November 2024

Britton Chance Center for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China.

Motivation: Neuroscientists have long endeavored to map brain connectivity, yet the intricate nature of brain networks often leads them to concentrate on specific regions, hindering efforts to unveil a comprehensive connectivity map. Recent advancements in imaging and text mining techniques have enabled the accumulation of a vast body of literature containing valuable insights into brain connectivity, facilitating the extraction of whole-brain connectivity relations from this corpus. However, the diverse representations of brain region names and connectivity relations pose a challenge for conventional machine learning methods and dictionary-based approaches in identifying all instances accurately.

View Article and Find Full Text PDF

To understand and measure political information consumption in the high-choice media environment, we need new methods to trace individual interactions with online content and novel techniques to analyse and detect politics-related information. In this paper, we report the results of a comparative analysis of the performance of automated content analysis techniques for detecting political content in the German language across different platforms. Using three validation datasets, we compare the performance of three groups of detection techniques relying on dictionaries, classic supervised machine learning, and deep learning.

View Article and Find Full Text PDF

Impact of walking on knee articular cartilage T2 values estimated with a dictionary-based approach - A pilot study.

Radiography (Lond)

November 2024

Centro Hospitalar Universitário de Santo António, Unidade Local de Saúde de Santo António, Orthopedic Department, Porto, Portugal; ICBAS, School of Medicine and Biomedical Sciences, University of Porto, Portugal. Electronic address:

Article Synopsis
  • Walking is important for the health of knee articular cartilage, but traditional MRI methods don't effectively detect early changes in cartilage composition due to exercise.
  • This study involved seven healthy volunteers, who had MRI scans of their knees before and after a 9-minute treadmill walk, to assess whether a quantitative T2 mapping technique could identify changes in cartilage related to water content.
  • The results showed that walking significantly increased T2 values in the knee cartilage, indicating changes in hydration that could be used for early detection of cartilage issues, particularly useful for at-risk patients.
View Article and Find Full Text PDF

Media coverage of depression on social media with specific framings could shape people's perception and attitude, which is significant in reducing the stigma and promoting support for depression sufferers. Adopting the lens of moral foundation theory (MFT), this study aims to explore the effect of inherent moral framings within depression coverage on social media on the stigma and approval attitudes toward depression in audiences' responses. A large language model and a dictionary-based approach were respectively adopted to score depression-related media coverages ( = 919) and corresponding comments ( = 92,505) collected from the Weibo platform against MFT's five dimensions and (de)stigma attitudes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!