Publications by authors named "Michael Halper"

Background: Ontologies house various kinds of domain knowledge in formal structures, primarily in the form of concepts and the associative relationships between them. Ontologies have become integral components of many health information processing environments. Hence, quality assurance of the conceptual content of any ontology is critical.

View Article and Find Full Text PDF

Many major medical ontologies go through a regular (bi-annual, monthly, etc.) release cycle. A new release will contain corrections to the previous release, as well as genuinely new concepts that are the result of either user requests or new developments in the domain.

View Article and Find Full Text PDF

Maintenance and use of a large ontology, consisting of thousands of knowledge assertions, are hampered by its scope and complexity. It is important to provide tools for summarization of ontology content in order to facilitate user "big picture" comprehension. We present a parameterized methodology for the semi-automatic summarization of major topics in an ontology, based on a compact summary of the ontology, called an "aggregate partial-area taxonomy", followed by manual enhancement.

View Article and Find Full Text PDF

Ontologies are important components of health information management systems. As such, the quality of their content is of paramount importance. It has been proven to be practical to develop quality assurance (QA) methodologies based on automated identification of sets of concepts expected to have higher likelihood of errors.

View Article and Find Full Text PDF

Objectives: Ontologies are knowledge structures that lend support to many health-information systems. A study is carried out to assess the quality of ontological concepts based on a measure of their complexity. The results show a relation between complexity of concepts and error rates of concepts.

View Article and Find Full Text PDF

The purpose of the Big Data to Knowledge initiative is to develop methods for discovering new knowledge from large amounts of data. However, if the resulting knowledge is so large that it resists comprehension, referred to here as Big Knowledge (BK), how can it be used properly and creatively? We call this secondary challenge, Big Knowledge to Use. Without a high-level mental representation of the kinds of knowledge in a BK knowledgebase, effective or innovative use of the knowledge may be limited.

View Article and Find Full Text PDF

The gene ontology (GO) is used extensively in the field of genomics. Like other large and complex ontologies, quality assurance (QA) efforts for GO's content can be laborious and time consuming. Abstraction networks (AbNs) are summarization networks that reveal and highlight high-level structural and hierarchical aggregation patterns in an ontology.

View Article and Find Full Text PDF

The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is an extensive reference terminology with an attendant amount of complexity. It has been updated continuously and revisions have been released semi-annually to meet users' needs and to reflect the results of quality assurance (QA) activities. Two measures based on structural features are proposed to track the effects of both natural terminology growth and QA activities based on aspects of the complexity of SNOMED CT.

View Article and Find Full Text PDF

Objective: Terminologies and terminological systems have assumed important roles in many medical information processing environments, giving rise to the "big knowledge" challenge when terminological content comprises tens of thousands to millions of concepts arranged in a tangled web of relationships. Use and maintenance of knowledge structures on that scale can be daunting. The notion of abstraction network is presented as a means of facilitating the usability, comprehensibility, visualization, and quality assurance of terminologies.

View Article and Find Full Text PDF

Abstraction networks are compact summarizations of terminologies used to support orientation and terminology quality assurance (TQA). Area taxonomies and partial-area taxonomies are abstraction networks that have been successfully employed in support of TQA of small SNOMED CT hierarchies. However, nearly half of SNOMED CT's concepts are in the large Procedure and Clinical Finding hierarchies.

View Article and Find Full Text PDF

BioPortal contains over 300 ontologies, for which quality assurance (QA) is critical. Abstraction networks (ANs), compact summarizations of ontology structure and content, have been used in such QA efforts, typically in a "one-off" manner for a single ontology. Ontologies can be characterized-independently of knowledge-content focus-from a structural standpoint leading to the formulation of ontology families.

View Article and Find Full Text PDF

Objective: By 2015, SNOMED CT (SCT) will become the USA's standard for encoding diagnoses and problem lists in electronic health records (EHRs). To facilitate this effort, the National Library of Medicine has published the "SCT Clinical Observations Recording and Encoding" and the "Veterans Health Administration and Kaiser Permanente" problem lists (collectively, the "PL"). The PL is studied in regard to its readiness to support meaningful use of EHRs.

View Article and Find Full Text PDF

An abstraction network is an auxiliary network of nodes and links that provides a compact, high-level view of an ontology. Such a view lends support to ontology orientation, comprehension, and quality-assurance efforts. A methodology is presented for deriving a kind of abstraction network, called a partial-area taxonomy, for the Ontology of Clinical Research (OCRe).

View Article and Find Full Text PDF

As SNOMED usage becomes more ingrained within applications, its range of concept descriptors, and particularly its synonym adequacy, becomes more important. A simulated clinical scenario involving various term-based concept searches is used to assess whether SNOMED's concept descriptors provide sufficient differentiation to enable possible concept selection between similar terms. Four random samples from different SNOMED concept populations are utilized.

View Article and Find Full Text PDF

Background: Terms representing chemical concepts found the Unified Medical Language System (UMLS) are used to derive an expanded semantic network with mutually exclusive semantic types. The UMLS Semantic Network (SN) is composed of a collection of broad categories called semantic types (STs) that are assigned to concepts. Within the UMLS's coverage of the chemical domain, we find a great deal of concepts being assigned more than one ST.

View Article and Find Full Text PDF

A cycle in the parent relationship hierarchy of the UMLS is a configuration that effectively makes some concept(s) an ancestor of itself. Such a structural inconsistency can easily be found automatically. A previous strategy for disconnecting cycles is to break them with the deletion of one or more parent relationships-irrespective of the correctness of the deleted relationships.

View Article and Find Full Text PDF

Auditors of a large terminology, such as SNOMED CT, face a daunting challenge. To aid them in their efforts, it is essential to devise techniques that can automatically identify concepts warranting special attention. "Complex" concepts, which by their very nature are more difficult to model, fall neatly into this category.

View Article and Find Full Text PDF

An algorithmically-derived abstraction network, called the partial-area taxonomy, for a SNOMED hierarchy has led to the identification of concepts considered complex. The designation "complex" is arrived at automatically on the basis of structural analyses of overlap among the constituent concept groups of the partial-area taxonomy. Such complex concepts, called overlapping concepts, constitute a tangled portion of a hierarchy and can be obstacles to users trying to gain an understanding of the hierarchy's content.

View Article and Find Full Text PDF

Objective: The Unified Medical Language System (UMLS) integrates terms from different sources into concepts and supplements these with the assignment of one or more high-level semantic types (STs) from its Semantic Network (SN). For a composite organic chemical concept, multiple assignments of organic chemical STs often serve to enumerate the types of the composite's underlying chemical constituents. This practice sometimes leads to the introduction of a forbidden redundant ST assignment, where both an ST and one of its descendants are assigned to the same concept.

View Article and Find Full Text PDF

The UMLS contains terms from many sources. Every update of a source requires reintegration. Each new term needs to be assigned to a preexisting UMLS concept, or a new concept must be created.

View Article and Find Full Text PDF

Concepts whose terms are of a similar word structure are expected to have similar logical representations. Anecdotal examples from SNOMED CT indicate that this may not always be the case. An investigation into the extent of inconsistent modeling in SNOMED CT hierarchies is carried out.

View Article and Find Full Text PDF

Objective: Each Unified Medical Language System (UMLS) concept is assigned one or more semantic types (ST). A dynamic methodology for aiding an auditor in finding concepts that are missing the assignment of a given ST, S is presented.

Design: The first part of the methodology exploits the previously introduced Refined Semantic Network and accompanying refined semantic types (RST) to help narrow the search space for offending concepts.

View Article and Find Full Text PDF

The UMLS's integration of more than 100 source vocabularies, not necessarily consistent with one another, causes some inconsistencies. The purpose of auditing the UMLS is to detect such inconsistencies and to suggest how to resolve them while observing the requirement of fully representing the content of each source in the UMLS. A software tool, called the Neighborhood Auditing Tool (NAT), that facilitates UMLS auditing is presented.

View Article and Find Full Text PDF

Gene terminologies are playing an increasingly important role in the ever-growing field of genomic research. While errors in large, complex terminologies are inevitable, gene terminologies are even more susceptible to them due to the rapid growth of genomic knowledge and the nature of its discovery. It is therefore very important to establish quality-assurance protocols for such genomic-knowledge repositories.

View Article and Find Full Text PDF

Objective: Synonym-substitution algorithms have been developed for the purpose of matching source vocabulary terms with existing Unified Medical Language System (UMLS) terms during the integration process. A drawback is the possible explosion in the number of newly generated (potential) synonyms, which can tax computational and expert review resources. Experiments are run using a synonym-substitution approach based on WordNet to see how constraining two methodological parameters, namely, "maximum number of substitutions per term" and "maximum term length," affects performance.

View Article and Find Full Text PDF