Ontology-Based Search of Genomic Metadata.

Javier D Fernandez Maurizio Lenzerini Marco Masseroli Francesco Venco Stefano Ceri

IEEE/ACM Trans Comput Biol Bioinform

Published: January 2017

The ENCODE project is an expansive public database of over 4,000 experiments and 25,000 data files aimed at unlocking biological knowledge through genomic research.
Searching for relevant datasets within ENCODE has been challenging due to the simplicity and incompleteness of its metadata and the lack of a coherent ontology.
To address these limitations, the S.O.S. GeM system was developed, allowing for effective semantic search and retrieval of ENCODE datasets by utilizing a Semantic Knowledge Base that enhances search accuracy and relevance for biologists' queries.

The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TCBB.2015.2495179	DOI Listing

Publication Analysis

Top Keywords

encode datasets

encode metadata

semantic knowledge

knowledge base

biologists' queries

encode

knowledge

datasets

ontology-based search

search genomic

Similar Publications

Identification of IGFBP3 and LGALS1 as potential secreted biomarkers for clear cell renal cell carcinoma based on bioinformatics analysis and machine learning.

Adv Clin Exp Med

January 2025

Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, USA.

Wunchana Seubwai Sakkarn Sangkhamanon Xuhong Zhang

Background: Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma (RCC). Due to the lack of symptoms until advanced stages, early diagnosis of ccRCC is challenging. Therefore, the identification of novel secreted biomarkers for the early detection of ccRCC is urgently needed.

View Article and Find Full Text PDF

Similar Publications

Phenotypic antibiotic resistance prediction using antibiotic resistance genes and machine learning models in Mannheimia haemolytica.

Vet Microbiol

January 2025

Purdue University, Department of Animal Sciences, West Lafayette, IN 47907 USA. Electronic address:

Carmen L Wickware Audrey C Ellis Mohit Verma Timothy A Johnson

Mannheimia haemolytica is one of the most common causative agents of bovine respiratory disease (BRD); however, antibiotic resistance in this species is increasing, making treatment more difficult. Integrative-conjugative elements (ICE), a subset of mobile genetic elements (MGE), encoding up to 100 genes have been reported in Mannheimia haemolytica genomes to confer multidrug resistance, including resistance to antibiotics commonly used in the treatment of BRD. However, the presence of antibiotic resistance genes (ARGs) does not always agree with phenotypic resistance.

View Article and Find Full Text PDF

Similar Publications

UniAMP: enhancing AMP prediction using deep neural networks with inferred information of peptides.

BMC Bioinformatics

January 2025

College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China.

Zixin Chen Chengming Ji Wenwen Xu Jianfeng Gao Ji Huang

Antimicrobial peptides (AMPs) have been widely recognized as a promising solution to combat antimicrobial resistance of microorganisms due to the increasing abuse of antibiotics in medicine and agriculture around the globe. In this study, we propose UniAMP, a systematic prediction framework for discovering AMPs. We observe that feature vectors used in various existing studies constructed from peptide information, such as sequence, composition, and structure, can be augmented and even replaced by information inferred by deep learning models.

View Article and Find Full Text PDF

Similar Publications

Unsupervised deep learning of electrocardiograms enables scalable human disease profiling.

NPJ Digit Med

January 2025

Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.

Sam F Friedman Shaan Khurshid Rachael A Venn Xin Wang Nate Diamant

The 12-lead electrocardiogram (ECG) is inexpensive and widely available. Whether conditions across the human disease landscape can be detected using the ECG is unclear. We developed a deep learning denoising autoencoder and systematically evaluated associations between ECG encodings and ~1,600 Phecode-based diseases in three datasets separate from model development, and meta-analyzed the results.

View Article and Find Full Text PDF

Similar Publications

ICH-PRNet: a cross-modal intracerebral haemorrhage prognostic prediction method using joint-attention interaction mechanism.

Neural Netw

January 2025

Medical Big Data Lab, Shenzhen Research Institute of Big Data, Shenzhen, 518172, China. Electronic address:

Xinlei Yu Ahmed Elazab Ruiquan Ge Jichao Zhu Lingyan Zhang

Accurately predicting intracerebral hemorrhage (ICH) prognosis is a critical and indispensable step in the clinical management of patients post-ICH. Recently, integrating artificial intelligence, particularly deep learning, has significantly enhanced prediction accuracy and alleviated neurosurgeons from the burden of manual prognosis assessment. However, uni-modal methods have shown suboptimal performance due to the intricate pathophysiology of the ICH.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!