Publications by authors named "Jose A Minarro-Gimenez"

Knowledge about transcription factor binding and regulation, target genes, cis-regulatory modules and topologically associating domains is not only defined by functional associations like biological processes or diseases but also has a determinative genome location aspect. Here, we exploit these location and functional aspects together to develop new strategies to enable advanced data querying. Many databases have been developed to provide information about enhancers, but a schema that allows the standardized representation of data, securing interoperability between resources, has been lacking.

View Article and Find Full Text PDF

SNOMED CT postcoordination is an underused mechanism that can help to implement advanced systems for the automatic extraction and encoding of clinical information from text. It allows defining non-existing SNOMED CT concepts by their relationships with existing ones. Manually building postcoordinated expressions is a difficult task.

View Article and Find Full Text PDF

Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain.

View Article and Find Full Text PDF

Background: Semantic interoperability of eHealth services within and across countries has been the main topic in several research projects. It is a key consideration for the European Commission to overcome the complexity of making different health information systems work together. This paper describes a study within the EU-funded project ASSESS CT, which focuses on assessing the potential of SNOMED CT as core reference terminology for semantic interoperability at European level.

View Article and Find Full Text PDF

SNOMED CT provides about 300,000 codes with fine-grained concept definitions to support interoperability of health data. Coding clinical texts with medical terminologies it is not a trivial task and is prone to disagreements between coders. We conducted a qualitative analysis to identify sources of disagreements on an annotation experiment which used a subset of SNOMED CT with some restrictions.

View Article and Find Full Text PDF

Organised repositories of published scientific literature represent a rich source for research in knowledge representation. MEDLINE, one of the largest and most popular biomedical literature databases, provides metadata for over 24 million articles each of which is indexed using the MeSH controlled vocabulary. In order to reuse MeSH annotations for knowledge construction, we processed this data and extracted the most relevant patterns of assigned descriptors over time.

View Article and Find Full Text PDF

SNOMED CT supports post-coordination, a technique to combine clinical concepts to ontologically define more complex concepts. This technique follows the validity restrictions defined in the SNOMED CT Concept Model. Pre-coordinated expressions are compositional expressions already in SNOMED CT, whereas post-coordinated expressions extend its content.

View Article and Find Full Text PDF

The construction and publication of predications form scientific literature databases like MEDLINE is necessary due to the large amount of resources available. The main goal is to infer meaningful predicates between relevant co-occurring MeSH concepts manually annotated from MEDLINE records. The resulting predications are formed as subject-predicate-object triples.

View Article and Find Full Text PDF

Big data resources are difficult to process without a scaled hardware environment that is specifically adapted to the problem. The emergence of flexible cloud-based virtualization techniques promises solutions to this problem. This paper demonstrates how a billion of lines can be processed in a reasonable amount of time in a cloud-based environment.

View Article and Find Full Text PDF

Background: Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating content readable by machines. Linked Open Data is a Semantic Web initiative that promotes the publication and sharing of data in machine readable semantic formats.

View Article and Find Full Text PDF

The massive accumulation of biomedical knowledge is reflected by the growth of the literature database MEDLINE with over 23 million bibliographic records. All records are manually indexed by MeSH descriptors, many of them refined by MeSH subheadings. We use subheading information to cluster types of MeSH descriptor co-occurrences in MEDLINE by processing co-occurrence information provided by the UMLS.

View Article and Find Full Text PDF

Translating huge medical terminologies like SNOMED CT is costly and time consuming. We present a methodology that acquires substring substitution rules for single words, based on the known similarity between medical words and their translations, due to their common Latin / Greek origin. Character translation rules are automatically acquired from pairs of English words and their automated translations to German.

View Article and Find Full Text PDF

Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources. Such heterogeneity makes difficult not only the generation of research-oriented dataset but also its exploitation. In recent years, the Open Data paradigm has proposed new ways for making data available in ways that sharing and integration are facilitated.

View Article and Find Full Text PDF

Background: Every year, hundreds of thousands of patients experience treatment failure or adverse drug reactions (ADRs), many of which could be prevented by pharmacogenomic testing. However, the primary knowledge needed for clinical pharmacogenomics is currently dispersed over disparate data structures and captured in unstructured or semi-structured formalizations. This is a source of potential ambiguity and complexity, making it difficult to create reliable information technology systems for enabling clinical pharmacogenomics.

View Article and Find Full Text PDF

The identification of relevant predicates between co-occurring concepts in scientific literature databases like MEDLINE is crucial for using these sources for knowledge extraction, in order to obtain meaningful biomedical predications as subject-predicate-object triples. We consider the manually assigned MeSH indexing terms (main headings and subheadings) in MEDLINE records as a rich resource for extracting a broad range of domain knowledge. In this paper, we explore the combination of a clustering method for co-occurring concepts based on their related MeSH subheadings in MEDLINE with the use of SemRep, a natural language processing engine, which extracts predications from free text documents.

View Article and Find Full Text PDF

The semantic interoperability of clinical information requires methods able to transform heterogeneous data sources from both technological and structural perspectives, into representations that facilitate the sharing of meaning. The SemanticHealthNet (SHN) project proposes using semantic content patterns for representing clinical information based on a model of meaning, preventing users from a deep knowledge on ontology and description logics formalism. In this work we propose a flexible transformation method that uses semantic content patterns to guide the mapping between the source data and a target domain ontology.

View Article and Find Full Text PDF

With the rapidly growing amount of biomedical literature it becomes increasingly difficult to find relevant information quickly and reliably. In this study we applied the word2vec deep learning toolkit to medical corpora to test its potential for improving the accessibility of medical knowledge. We evaluated the efficiency of word2vec in identifying properties of pharmaceuticals based on mid-sized, unstructured medical text corpora without any additional background knowledge.

View Article and Find Full Text PDF

The availability of pharmacogenomic data of individual patients can significantly improve physicians' prescribing behavior, lead to a reduced incidence of adverse drug events and an improvement of effectiveness of treatment. The Medicine Safety Code (MSC) initiative is an effort to improve the ability of clinicians and patients to share pharmacogenomic data and to use it at the point of care. The MSC is a standardized two-dimensional barcode that captures individual pharmacogenomic data.

View Article and Find Full Text PDF

The availability of pharmacogenomic data of individual patients can significantly improve physicians' prescribing behavior, lead to a reduced incidence of adverse drug events and an improvement of effectiveness of treatment. The Medicine Safety Code (MSC) initiative is an effort to improve the ability of clinicians and patients to share pharmacogenomic data and to use it at the point of care. The MSC is a standardized two-dimensional barcode that captures individual pharmacogenomic data.

View Article and Find Full Text PDF

Background: The development of genotyping and genetic sequencing techniques and their evolution towards low costs and quick turnaround have encouraged a wide range of applications. One of the most promising applications is pharmacogenomics, where genetic profiles are used to predict the most suitable drugs and drug dosages for the individual patient. This approach aims to ensure appropriate medical treatment and avoid, or properly manage, undesired side effects.

View Article and Find Full Text PDF

Genome sequencing projects generate vast amounts of data of a wide variety of types and complexities, and at a growing pace. Traditionally, the annotation of such sequences was difficult to share with other researchers. Despite the fact that this has improved with the development and application of biological ontologies, such annotation efforts remain isolated since the amount of information that can be used from other annotation projects is limited.

View Article and Find Full Text PDF

Possibly the most important requirement to support co-operative work among health professionals and institutions is the ability of sharing EHRs in a meaningful way, and it is widely acknowledged that standardization of data and concepts is a prerequisite to achieve semantic interoperability in any domain. Different international organizations are working on the definition of EHR architectures but the lack of tools that implement them hinders their broad adoption. In this paper we present ResearchEHR, a software platform whose objective is to facilitate the practical application of EHR standards as a way of reaching the desired semantic interoperability.

View Article and Find Full Text PDF

Semantic Web technologies like RDF and OWL are currently applied in life sciences to improve knowledge management by integrating disparate information. Many of the systems that perform such task, however, only offer a SPARQL query interface, which is difficult to use for life scientists. We present the OGO system, which consists of a knowledge base that integrates information of orthologous sequences and genetic diseases, providing an easy to use ontology-constrain driven query interface.

View Article and Find Full Text PDF

Background: There exist several information resources about orthology of genes and proteins, and there are also systems for querying those resources in an integrated way. However, caveats with current approaches include lack of integration, since results are shown sequentially by resource, meaning that there is redundant information and the users are required to combine the results obtained manually.

Results: In this paper we have applied the Ontological Gene Orthology approach, which makes use of a domain ontology to integrate the information output from selected orthology resources.

View Article and Find Full Text PDF