It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these "rich," discipline-specific elements.
View Article and Find Full Text PDFMetadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users.
View Article and Find Full Text PDFThe adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses.
View Article and Find Full Text PDFBackground: Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets.
View Article and Find Full Text PDFIn biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies.
View Article and Find Full Text PDFThe Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed-the CEDAR Workbench-is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. The CEDAR Workbench provides a versatile, REST-based environment for authoring metadata that are enriched with terms from ontologies.
View Article and Find Full Text PDFBackground: Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs.
View Article and Find Full Text PDFThe Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates.
View Article and Find Full Text PDFTHERE ARE TWO KEY CHALLENGES HINDERING EFFECTIVE USE OF QUANTITATIVE ASSESSMENT OF IMAGING IN CANCER RESPONSE ASSESSMENT: 1) Radiologists usually describe the cancer lesions in imaging studies subjectively and sometimes ambiguously, and 2) it is difficult to repurpose imaging data, because lesion measurements are not recorded in a format that permits machine interpretation and interoperability. We have developed a freely available software platform on the basis of open standards, the electronic Physician Annotation Device (ePAD), to tackle these challenges in two ways. First, ePAD facilitates the radiologist in carrying out cancer lesion measurements as part of routine clinical trial image interpretation workflow.
View Article and Find Full Text PDFBackground: A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text.
View Article and Find Full Text PDFAMIA Annu Symp Proc
February 2013
Biomedical ontologies are increasingly being used to improve information retrieval methods. In this paper, we present a novel information retrieval approach that exploits knowledge specified by the Semantic Web ontology and rule languages OWL and SWRL. We evaluate our approach using an autism ontology that has 156 SWRL rules defining 145 autism phenotypes.
View Article and Find Full Text PDFStud Health Technol Inform
April 2011
The Extensible Markup Language (XML) is increasingly being used for biomedical data exchange. The parallel growth in the use of ontologies in biomedicine presents opportunities for combining the two technologies to leverage the semantic reasoning services provided by ontology-based tools. There are currently no standardized approaches for taking XML-encoded biomedical information models and representing and reasoning with them using ontologies.
View Article and Find Full Text PDFArtif Intell Eng Des Anal Manuf
November 2009
Problem solving methods (PSMs) are software components that represent and encode reusable algorithms. They can be combined with representations of domain knowledge to produce intelligent application systems. A goal of research on PSMs is to provide principled methods and tools for composing and reusing algorithms in knowledge-based systems.
View Article and Find Full Text PDFIdentifying, tracking and reasoning about tumor lesions is a central task in cancer research and clinical practice that could potentially be automated. However, information about tumor lesions in imaging studies is not easily accessed by machines for automated reasoning. The Annotation and Image Markup (AIM) information model recently developed for the cancer Biomedical Informatics Grid provides a method for encoding the semantic information related to imaging findings, enabling their storage and transfer.
View Article and Find Full Text PDFWorldwide developments concerning infectious diseases and bioterrorism are driving forces for improving aberrancy detection in public health surveillance. The performance of an aberrancy detection algorithm can be measured in terms of sensitivity, specificity and timeliness. However, these metrics are probabilistically dependent variables and there is always a trade-off between them.
View Article and Find Full Text PDFManaging time-stamped data is essential to clinical research activities and often requires the use of considerable domain knowledge. Adequately representing and integrating temporal data and domain knowledge is difficult with the database technologies used in most clinical research systems. There is often a disconnect between the database representation of research data and corresponding domain knowledge of clinical research concepts.
View Article and Find Full Text PDFMany biomedical research databases contain time-oriented data resulting from longitudinal, time-series and time-dependent study designs, knowledge of which is not handled explicitly by most data-analytic methods. To make use of such knowledge about research data, we have developed an ontology-driven temporal mining method, called ChronoMiner. Most mining algorithms require data be inputted in a single table.
View Article and Find Full Text PDFStud Health Technol Inform
November 2007
Managing time-stamped data is essential to clinical research activities and often requires the use of considerable domain knowledge. Adequately representing this domain knowledge is difficult in relational database systems. As a result, there is a need for principled methods to overcome the disconnect between the database representation of time-oriented research data and corresponding knowledge of domain-relevant concepts.
View Article and Find Full Text PDFAMIA Annu Symp Proc
September 2007
Biomedical databases contain considerable amounts of time-oriented data, which are not typically in a format suitable for querying complex temporal patterns. We address this problem in implementing Synchronus, a tool for ontology-driven mapping of data from an existing relational database to a database schema with a uniform temporal representation. We discuss the design of Synchronus, which consists of a schema-mapping ontology and a data-mapping algorithm that together provide general capabilities for database transformation.
View Article and Find Full Text PDFAMIA Annu Symp Proc
September 2007
Considerable prior work has been taken by researchers to address the need for temporal data deduction in biomedical applications, but relatively little research has examined how to create robust, efficient approaches for such methods using large databases. We present the design and evaluation of a distributed architecture that can be dynamically optimized to perform large-scale abstraction of temporal data.
View Article and Find Full Text PDFProc IEEE Comput Syst Bioinform Conf
May 2007
With the rapid growth of biomedical research databases, opportunities for scientific inquiry have expanded quickly and led to a demand for computational methods that can extract biologically relevant patterns among vast amounts of data. A significant challenge is identifying temporal relationships among genotypic and clinical (phenotypic) data. Few software tools are available for such pattern matching, and they are not interoperable with existing databases.
View Article and Find Full Text PDFInformation technology can support the implementation of clinical research findings in practice settings. Technology can address the quality gap in health care by providing automated decision support to clinicians that integrates guideline knowledge with electronic patient data to present real-time, patient-specific recommendations. However, technical success in implementing decision support systems may not translate directly into system use by clinicians.
View Article and Find Full Text PDFAMIA Annu Symp Proc
December 2004
Heightened concerns about bioterrorism are forcing changes to the traditional biosurveillance-model. Public health departments are under pressure to follow multiple, non-specific, pre-diagnostic indicators, often drawn from many data sources. As a result, there is a need for biosurveillance systems that can use a variety of analysis techniques to rapidly integrate and process multiple diverse data feeds using a variety of problem solving techniques to give timely analysis.
View Article and Find Full Text PDFClinical databases typically contain a significant amount of temporal information. This information is often crucial in medical decision-support systems. Although temporal queries are common in clinical systems, the medical informatics field has no standard means for representing or querying temporal data.
View Article and Find Full Text PDF