Unlabelled: The recognition of gene/protein names in literature is one of the pivotal steps in the processing of biological literatures for information extraction or data mining. We have compiled a lexicon of biomedical words (conserved patterns/ potential motifs) which has the combination of only 20 alphabets of amino acids. The remaining 6 letters of the English alphabets (B, J, O, U, X, Z) are treated as invalid amino acid characters (to our context), We have jumbled the 6 letters for the sake of usage and convenience and termed as 'JUZBOX' and these characters were filtered in the biomedical lexicon.
View Article and Find Full Text PDF