AI Article Synopsis

  • - The research focuses on improving the understanding of gene sets and pathways in biomedical data using enhanced gene lists called PAGs (pathways, annotated gene lists, and gene signatures), which include additional metadata to support biological insights.
  • - A new clustering method named todenE combines topology-based and density-based approaches to better identify groups of PAGs, creating clearer functional representations known as Super-PAGs, while utilizing Large Language Models (LLM) to enrich contextual information.
  • - Through performance comparisons and innovative metrics like the Disparity Index (DI), the study assesses different clustering methods, ultimately showing that todenE offers improved semantic quality and inclusivity in gene clustering when applied to various datasets.

Article Abstract

The integrative analysis of gene sets, networks, and pathways is pivotal for deciphering omics data in translational biomedical research. To significantly increase gene coverage and enhance the utility of pathways, annotated gene lists, and gene signatures from diverse sources, we introduced pathways, annotated gene lists, and gene signatures (PAGs) enriched with metadata to represent biological functions. Furthermore, we established PAG-PAG networks by leveraging gene member similarity and gene regulations. However, in practice, high similarity in functional descriptions or gene membership often leads to redundant PAGs, hindering the interpretation from a fuzzy enriched PAG list. In this study, we developed todenE (topology-based and density-based ensemble) clustering, pioneering in integrating topology-based and density-based clustering methods to detect PAG communities leveraging the PAG network and Large Language Models (LLM). In computational genomics annotation, the genes can be grouped/clustered through the gene relationships and gene functions via guilt by association. Similarly, PAGs can be grouped into higher-level clusters, forming concise functional representations called Super-PAGs. TodenE captures PAG-PAG similarity and encapsulates functional information through LLM, in characterizing network-based functional Super-PAGs. In synthetic data, we introduced a metric called the Disparity Index (DI), measuring the connectivity of gene neighbors to gauge clusterability. We compared multiple clustering algorithms to identify the best method for generating performance-driven clusters. In non-simulated data (Gene Ontology), by leveraging transfer learning and LLM, we formed a language-based similarity embedding. TodenE utilizes this embedding together with the topology-based embedding to generate putative Super-PAGs with superior performance in semantic and gene member inclusiveness.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526983PMC
http://dx.doi.org/10.1101/2024.10.20.619308DOI Listing

Publication Analysis

Top Keywords

gene
14
topology-based density-based
12
pag network
8
pathways annotated
8
annotated gene
8
gene lists
8
lists gene
8
gene signatures
8
gene member
8
functional
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!