Background: This study seeks to develop, test and assess a methodology for automatic extraction of a complete set of 'term-like phrases' and to create a terminology spectrum from a collection of natural language PDF documents in the field of chemistry. The definition of 'term-like phrases' is one or more consecutive words and/or alphanumeric string combinations with unchanged spelling which convey specific scientific meanings. A terminology spectrum for a natural language document is an indexed list of tagged entities including: recognized general scientific concepts, terms linked to existing thesauri, names of chemical substances/reactions and term-like phrases.
View Article and Find Full Text PDFComplexity analysis is capable of highlighting those gross evolutionary changes in gene promoter regions (loosely termed "promoter shuffling") that are undetectable by conventional DNA sequence alignment. Complexity analysis was therefore used here to identify the modular components (blocks) of the orthologous beta-globin gene promoter sequences of 22 vertebrate species, from zebrafish to humans. Considerable variation between the beta-globin gene promoters was apparent in terms of block presence/absence, copy number, and relative location.
View Article and Find Full Text PDF