Biology has now become an information science, and researchers are increasingly dependent on expert-curated biological databases to organize the findings from the published literature. We report here on a series of experiments related to the application of natural language processing to aid in the curation process for FlyBase. We focused on listing the normalized form of genes and gene products discussed in an article. We broke this into two steps: gene mention tagging in text, followed by normalization of gene names. For gene mention tagging, we adopted a statistical approach. To provide training data, we were able to reverse engineer the gene lists from the associated articles and abstracts, to generate text labeled (imperfectly) with gene mentions. We then evaluated the quality of the noisy training data (precision of 78%, recall 88%) and the quality of the HMM tagger output trained on this noisy data (precision 78%, recall 71%). In order to generate normalized gene lists, we explored two approaches. First, we explored simple pattern matching based on synonym lists to obtain a high recall/low precision system (recall 95%, precision 2%). Using a series of filters, we were able to improve precision to 50% with a recall of 72% (balanced F-measure of 0.59). Our second approach combined the HMM gene mention tagger with various filters to remove ambiguous mentions; this approach achieved an F-measure of 0.72 (precision 88%, recall 61%). These experiments indicate that the lexical resources provided by FlyBase are complete enough to achieve high recall on the gene list task, and that normalization requires accurate disambiguation; different strategies for tagging and normalization trade off recall for precision.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jbi.2004.08.010 | DOI Listing |
BMC Musculoskelet Disord
January 2025
Medical Genetic Diagnosis and Therapy Center, Fujian Maternity and Child Health Hospital, College of Clinical Medicine for Obstetrics and Gynecology and Pediatrics, Fujian Medical University, 18 Daoshan Road, Fuzhou, 350001, China.
Background: Congenital muscular dystrophies (CMDs) and myopathies (CMYOs) are a clinically and genetically heterogeneous group of neuromuscular disorders that share common features, such as muscle weakness, hypotonia, characteristic changes on muscle biopsy and motor retardation. In this study, we recruited eleven families with early-onset neuromuscular disorders in China, aimed to clarify the underlying genetic etiology.
Methods: Essential clinical tests, such as biomedical examination, electromyography and muscle biopsy, were applied to evaluate patient phenotypes.
Food Sci Nutr
January 2025
Department of Chemistry, Thomas J. R. Faulkner College of Science and Technology University of Liberia Monrovia Montserrado County Liberia.
Citronellol (CT) is a naturally occurring lipophilic monoterpenoid which has shown anticancer effects in numerous cancerous cell lines. This study was, therefore, designed to examine CT's potential as an anticancer agent against glioblastoma (GBM). Network pharmacology analysis was employed to identify potential anticancer targets of CT.
View Article and Find Full Text PDFDatabase (Oxford)
January 2025
School of Computer Science and Technology, Xidian University, 266 Xinglong Section of Xifeng Road, Xi'an, Shaanxi 710126, China.
The pathogenesis of complex diseases is intricately linked to various genes and network medicine has enhanced understanding of diseases. However, most network-based approaches ignore interactions mediated by noncoding RNAs (ncRNAs) and most databases only focus on the association between genes and diseases. Based on the mentioned questions, we have developed DisGeNet, a database focuses not only on the disease-associated genes but also on the interactions among genes.
View Article and Find Full Text PDFInt J Mol Sci
January 2025
Grassland and Forages Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000, Republic of Korea.
Light is a vital regulator of photosynthesis, energy production, plant growth, and morphogenesis. Although these key physiological processes are well understood, the effects of light quality on the pigment content, oxidative stress, reactive oxygen species (ROS) production, antioxidant defense systems, and biomass yield of plants remain largely unexplored. In this study, we applied different light-emitting diode (LED) treatments, including white light, red light, blue light, and a red+blue (1:1) light combination, to evaluate the traits mentioned above in alfalfa ( L.
View Article and Find Full Text PDFPlants (Basel)
December 2024
Jiangsu Academy of Forestry, Nanjing 211153, China.
The paulownia tree belongs to the Paulowniaceae family. Paulownia has strong vitality; has strong adaptability to harsh environmental conditions; and can be used as building raw material, as well as processing drugs and having other purposes. In the research field of MYB transcription factors of the paulownia tree, it is rare to discuss the resistance to abiotic stress.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!