Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA.

Sarah Djebali Franck Delaplace Hugues Roest Crollius

Genome Biol

Dyogen Lab, CNRS UMR8541, Ecole Normale Supérieure, 46 rue d'Ulm, 75005 Paris, France.

Published: September 2006

Background: Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism.

Results: We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts.

Conclusion: We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810556	PMC
http://dx.doi.org/10.1186/gb-2006-7-s1-s7	DOI Listing

Publication Analysis

Top Keywords

human annotators

eukaryotic genomic

genomic dna

automatic methods

egasp project

protein coding

human

exogean

exogean framework

framework annotating

Similar Publications

Mining functional gene modules by multi-view NMF of phenome-genome association.

BMC Genomics

January 2025

College of Software, Nankai University, TianJin, China.

Xu Jin WenQian He MingMing Liu Lin Wang YaoGong Zhang

Background: Mining functional gene modules from genomic data is an important step to detect gene members of pathways or other relations such as protein-protein interactions. This work explores the plausibility of detecting functional gene modules by factorizing gene-phenotype association matrix from the phenotype ontology data rather than the conventionally used gene expression data. Recently, the hierarchical structure of phenotype ontologies has not been sufficiently utilized in gene clustering while functionally related genes are consistently associated with phenotypes on the same path in phenotype ontologies.

View Article and Find Full Text PDF

Similar Publications

Autophagy Associated Genes (ARGs) -Based Predictive Model AIDPS for Prostate Cancer.

J Cell Mol Med

January 2025

Department of Andrology, The First Hospital of Jilin University, Changchun, China.

Zhiyi Zhao Yongjin Yang Zhou Sun LianMing Fan Lingyun Liu

Prostate cancer (PCa) is one of the most common cancers in men worldwide. Autophagy-related genes (ARGs) may play an important role in various biological processes of PCa. The aim of this study was to identify and evaluate autophagy-related features to predict clinical outcomes in patients with PCa.

View Article and Find Full Text PDF

Similar Publications

A collaborative network analysis for the interpretation of transcriptomics data in Huntington's disease.

Sci Rep

January 2025

Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.

Ozan Ozisik Nazli Sila Kara Tooba Abbassi-Daloii Morgane Térézol Elsa C Kuijper

Rare diseases may affect the quality of life of patients and be life-threatening. Therapeutic opportunities are often limited, in part because of the lack of understanding of the molecular mechanisms underlying these diseases. This can be ascribed to the low prevalence of rare diseases and therefore the lower sample sizes available for research.

View Article and Find Full Text PDF

Similar Publications

Integrative bioinformatics and immunohistochemical analysis unravel the prognostic significance and immunological implication of LIMCH1 in breast cancer: a retrospective study.

Sci Rep

January 2025

Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, 238 Ziyang Road, Wuhan, 430060, Hubei, People's Republic of China.

Xin Yu Bei Li Wenge Li Jingping Yuan Shengrong Sun

The current mortality rates for breast cancer underscore the need for better prognostic tools; moreover, LIM and calponin homology domain 1 (LIMCH1), which is a protein with dual roles in cancer, is a promising candidate for investigation. This study employed an integrative approach combining bioinformatics analysis of The Cancer Genome Atlas (TCGA) cohort and clinical immunohistochemistry (IHC) cohort data. We analysed LIMCH1 expression patterns, its associations with clinicopathological features and prognosis, and its impact on the tumour immune microenvironment (TIME).

View Article and Find Full Text PDF

Similar Publications

Artificial Intelligence-Guided Identification of IGFBP7 as a Critical Indicator in Lactic Metabolism Determines Immunotherapy Response in Stomach Adenocarcinoma.

J Cell Mol Med

January 2025

Department of General Surgery, The Second Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China.

Minghua Wang Xiaofei Guo Xuyun Liu Lei Huang Chuang Yang

Due to considerable tumour heterogeneity, stomach adenocarcinoma (STAD) has a poor prognosis and varies in response to treatment, making it one of the main causes of cancer-related mortality globally. Recent data point to a significant role for metabolic reprogramming, namely dysregulated lactic acid metabolism, in the evolution of STAD and treatment resistance. This study used a series of artificial intelligence-related approaches to identify IGFBP7, a Schlafen family member, as a critical factor in determining the response to immunotherapy and lactic acid metabolism in STAD patients.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!