AI Article Synopsis

  • The ENCODE project is an expansive public database of over 4,000 experiments and 25,000 data files aimed at unlocking biological knowledge through genomic research.
  • Searching for relevant datasets within ENCODE has been challenging due to the simplicity and incompleteness of its metadata and the lack of a coherent ontology.
  • To address these limitations, the S.O.S. GeM system was developed, allowing for effective semantic search and retrieval of ENCODE datasets by utilizing a Semantic Knowledge Base that enhances search accuracy and relevance for biologists' queries.

Article Abstract

The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2015.2495179DOI Listing

Publication Analysis

Top Keywords

encode datasets
8
encode metadata
8
semantic knowledge
8
knowledge base
8
biologists' queries
8
encode
6
knowledge
5
datasets
5
ontology-based search
4
search genomic
4

Similar Publications

Background: Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma (RCC). Due to the lack of symptoms until advanced stages, early diagnosis of ccRCC is challenging. Therefore, the identification of novel secreted biomarkers for the early detection of ccRCC is urgently needed.

View Article and Find Full Text PDF

Mannheimia haemolytica is one of the most common causative agents of bovine respiratory disease (BRD); however, antibiotic resistance in this species is increasing, making treatment more difficult. Integrative-conjugative elements (ICE), a subset of mobile genetic elements (MGE), encoding up to 100 genes have been reported in Mannheimia haemolytica genomes to confer multidrug resistance, including resistance to antibiotics commonly used in the treatment of BRD. However, the presence of antibiotic resistance genes (ARGs) does not always agree with phenotypic resistance.

View Article and Find Full Text PDF

UniAMP: enhancing AMP prediction using deep neural networks with inferred information of peptides.

BMC Bioinformatics

January 2025

College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China.

Antimicrobial peptides (AMPs) have been widely recognized as a promising solution to combat antimicrobial resistance of microorganisms due to the increasing abuse of antibiotics in medicine and agriculture around the globe. In this study, we propose UniAMP, a systematic prediction framework for discovering AMPs. We observe that feature vectors used in various existing studies constructed from peptide information, such as sequence, composition, and structure, can be augmented and even replaced by information inferred by deep learning models.

View Article and Find Full Text PDF

The 12-lead electrocardiogram (ECG) is inexpensive and widely available. Whether conditions across the human disease landscape can be detected using the ECG is unclear. We developed a deep learning denoising autoencoder and systematically evaluated associations between ECG encodings and ~1,600 Phecode-based diseases in three datasets separate from model development, and meta-analyzed the results.

View Article and Find Full Text PDF

Accurately predicting intracerebral hemorrhage (ICH) prognosis is a critical and indispensable step in the clinical management of patients post-ICH. Recently, integrating artificial intelligence, particularly deep learning, has significantly enhanced prediction accuracy and alleviated neurosurgeons from the burden of manual prognosis assessment. However, uni-modal methods have shown suboptimal performance due to the intricate pathophysiology of the ICH.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!