Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2006.09.002DOI Listing

Publication Analysis

Top Keywords

gene protein
20
dictionary approach
8
protein identification
8
combined dictionaries
8
gene
5
protein
5
evaluation techniques
4
techniques increasing
4
increasing recall
4
recall dictionary
4

Similar Publications

Protein prenylation in mechanotransduction: implications for disease and therapy.

Trends Pharmacol Sci

January 2025

Department of Cardiology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China. Electronic address:

The process by which cells translate external mechanical cues into intracellular biochemical signals involves intricate mechanisms that remain unclear. In recent years, research into post-translational modifications (PTMs) has offered valuable insights into this field, spotlighting protein prenylation as a crucial mechanism in cellular mechanotransduction and various human diseases. Protein prenylation, which involves the covalent attachment of isoprenoid groups to specific substrate proteins, profoundly affects the functions of key mechanotransduction proteins such as Rho, Ras, and lamins.

View Article and Find Full Text PDF

Background And Study Aims: Necrotizing enterocolitis (NEC) is a severe gastrointestinal disease in neonates. In vitro model is an indispensable tool to study the pathogenesis of NEC. This study explored the effects of different stress factors on intestinal injury in vitro.

View Article and Find Full Text PDF

Engineering nitrogen fixation in cereals could reduce usage of chemical nitrogen fertilizers. Here, a nitrogenase biosynthesis pathway comprising 13 genes (nifB nifH nifD nifK nifE nifN nifX hesA nifV nifS nifU groES groEL) was introduced into rice by transforming multigene vectors and subsequently by sexual crossing between transgenic rice plants. Genome sequencing analysis revealed that 13 nif genes in F hybrid rice lines L12-13 and L8-17 were inserted at two loci on rice chromosome 1.

View Article and Find Full Text PDF

Urinary biomarkers of preeclampsia: An update.

Adv Clin Chem

January 2025

Department of Obstetrics and Gynecology, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil. Electronic address:

Preeclampsia (PE), a pregnancy-related syndrome, has motivated extensive research to understand its pathophysiology and develop early diagnostic methods. 'Omic' technologies, focusing on genes, mRNA, proteins, and metabolites, have revolutionized biological system studies. Urine emerges as an ideal non-invasive specimen for omics analysis, offering accessibility, easy collection, and stability, making it valuable for identifying biomarkers.

View Article and Find Full Text PDF

Genome-wide identification and expression analysis of the BBX gene family in Lagerstroemia indica grown under light stress.

Int J Biol Macromol

January 2025

Hunan Key Laboratory for Breeding of Clonally Propagated Forest Trees, Hunan Academy of Forestry, Changsha, Hunan 410004, China. Electronic address:

B-box proteins (BBX) play pivotal roles in the regulation of numerous growth and developmental processes in plants, particularly the light-mediated biosynthesis of pigments. To elucidate the role of BBX transcription factors in the anthocyanin biosynthetic pathway of Lagerstroemia indica leaves, this study identified 41 BBX genes in the L. indica genome.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!