Combining learning and constraints for genome-wide protein annotation.

BMC Bioinformatics

Department of Information Engineering and Computer Science, University of Trento, Via Sommarive, 5, Povo di Trento, 38123, Italy.

Published: June 2019

Background: The advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of related entities, like all proteins of a certain genome, many candidate annotations could be inconsistent, or very unlikely, given the existing knowledge. A sound predictive framework capable of accounting for this type of constraints in making predictions could substantially contribute to the quality of machine-generated annotations at a genomic scale.

Results: We present OCELOT, a predictive pipeline which simultaneously addresses functional and interaction annotation of all proteins of a given genome. The system combines sequence-based predictors for functional and protein-protein interaction (PPI) prediction with a consistency layer enforcing (soft) constraints as fuzzy logic rules. The enforced rules represent the available prior knowledge about the classification task, including taxonomic constraints over each GO hierarchy (e.g. a protein labeled with a GO term should also be labeled with all ancestor terms) as well as rules combining interaction and function prediction. An extensive experimental evaluation on the Yeast genome shows that the integration of prior knowledge via rules substantially improves the quality of the predictions. The system largely outperforms GoFDR, the only high-ranking system at the last CAFA challenge with a readily available implementation, when GoFDR is given access to intra-genome information only (as OCELOT), and has comparable or better results (depending on the hierarchy and performance measure) when GoFDR is allowed to use information from other genomes. Our system also compares favorably to recent methods based on deep learning.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6580517PMC
http://dx.doi.org/10.1186/s12859-019-2875-5DOI Listing

Publication Analysis

Top Keywords

proteins genome
8
prior knowledge
8
combining learning
4
constraints
4
learning constraints
4
constraints genome-wide
4
genome-wide protein
4
annotation
4
protein annotation
4
annotation background
4

Similar Publications

Prognostic value of carcinoembryonic antigen in colorectal adenocarcinoma: expanding hypotheses into clinical practice.

Clin Exp Med

January 2025

Liver & Peritonectomy Unit, Department of Surgery, St George Hospital, Pitney Building, Short Street, Kogarah, NSW, 2217, Australia.

Purpose: This study seeks to resolve a fundamental question in oncology: Why do appendiceal and colorectal adenocarcinomas exhibit distinct liver metastasis rates? Building on our prior hypothesis published in the British Journal of Surgery, our institution has investigated potential DNA mutations within the carcinoembryonic antigen-related cell adhesion molecule (CEACAM5) gene's Pro-Glu-Leu-Pro-Lys (PELPK) motif to evaluate its role as a biomarker for liver metastasis risk.

Methods: Partnering with the Australian Genome Research Facility, the PELPK motif of CEACAM5 was analysed in colorectal and appendiceal adenocarcinomas to detect DNA mutations associated with liver metastasis. Additionally, our institution performed the COPPER trial to assess carcinoembryonic antigen (CEA) levels in portal versus peripheral blood in patients with appendiceal adenocarcinoma and a systematic review and meta-analysis of 136 studies on CEA's prognostic significance among patients with colorectal and appendiceal adenocarcinoma.

View Article and Find Full Text PDF

RNA-binding motif protein RBM39 enhances the proliferation of gastric cancer cells by facilitating an oncogenic splicing switch in MRPL33.

Acta Pharmacol Sin

January 2025

Jiangsu Key Laboratory of Neuropsychiatric Diseases and College of Pharmaceutical Sciences, The Fourth Affiliated Hospital of Soochow University, Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Key Laboratory of Drug Research for Prevention and Treatment of Hyperlipidemic Diseases, Soochow University, Suzhou, 215123, China.

Gastric cancer is a malignant gastrointestinal disease characterized by high morbidity and mortality rates worldwide. The occurrence and progression of gastric cancer are influenced by various factors, including the abnormal alternative splicing of key genes. Recently, RBM39 has emerged as a tumor biomarker that regulates alternative splicing in several types of cancer.

View Article and Find Full Text PDF

Purpose: Preimplantation aneuploidy in humans is one of the primary causes of implantation failure and embryo miscarriage. This study was conducted to gain insight into gene expression changes that may result from aneuploidy in blastocysts through RNA-Seq analysis.

Methods: The surplus embryos of preimplantation genetic testing for aneuploidy (PGT-A) candidate couples with normal karyotype and maternal age < 38 were collected following identical ovarian stimulation protocol.

View Article and Find Full Text PDF

Background: Epilepsy has a genetic predisposition, yet causal factors and the dynamics of the immune environment in epilepsy are not fully understood.

Methods: We analyzed peripheral blood samples from epilepsy patients, identifying key genes associated with epilepsy risk through Mendelian randomization, using eQTLGen and genome-wide association studies. The peripheral immune environment's composition in epilepsy was explored using CIBERSORT.

View Article and Find Full Text PDF

Early missed abortion is defined as a pregnancy of ≤ 12 weeks in which there is a cessation of life in the developing embryo or fetus, leading to its retention within the uterine cavity without being spontaneously expelled promptly. This condition is commonly observed and significantly impacts human reproductive health. This study aimed to identify key genes related to ferroptosis that could serve as novel biomarkers for early missed abortion.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!