Motivation: A major challenge in studying gene regulation is to systematically reconstruct transcription regulatory modules, which are defined as sets of genes that are regulated by a common set of transcription factors. A commonly used approach for transcription module reconstruction is to derive coexpression clusters from a microarray dataset. However, such results often contain false positives because genes from many transcription modules may be simultaneously perturbed upon a given type of conditions. In this study, we propose and validate that genes, which form a coexpression cluster in multiple microarray datasets across diverse conditions, are more likely to form a transcription module. However, identifying genes coexpressed in a subset of many microarray datasets is not a trivial computational problem.

Results: We propose a graph-based data-mining approach to efficiently and systematically identify frequent coexpression clusters. Given m microarray datasets, we model each microarray dataset as a coexpression graph, and search for vertex sets which are frequently densely connected across [theta m] datasets (0 < or = theta < or = 1). For this novel graph-mining problem, we designed two techniques to narrow down the search space: (1) partition the input graphs into (overlapping) groups sharing common properties; (2) summarize the vertex neighbor information from the partitioned datasets onto the 'Neighbor Association Summary Graph's for effective mining. We applied our method to 105 human microarray datasets, and identified a large number of potential transcription modules, activated under different subsets of conditions. Validation by ChIP-chip data demonstrated that the likelihood of a coexpression cluster being a transcription module increases significantly with its recurrence. Our method opens a new way to exploit the vast amount of existing microarray data accumulation for gene regulation study. Furthermore, the algorithm is applicable to other biological networks for approximate network module mining.

Availability: http://zhoulab.usc.edu/NeMo/.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btm227DOI Listing

Publication Analysis

Top Keywords

microarray datasets
16
transcription module
12
systematically reconstruct
8
regulatory modules
8
gene regulation
8
coexpression clusters
8
clusters microarray
8
microarray dataset
8
transcription modules
8
coexpression cluster
8

Similar Publications

Biomedical datasets are the mainstays of computational biology and health informatics projects, and can be found on multiple data platforms online or obtained from wet-lab biologists and physicians. The quality and the trustworthiness of these datasets, however, can sometimes be poor, producing bad results in turn, which can harm patients and data subjects. To address this problem, policy-makers, researchers, and consortia have proposed diverse regulations, guidelines, and scores to assess the quality and increase the reliability of datasets.

View Article and Find Full Text PDF

Background: IgA nephropathy (IgAN) is a leading cause of renal failure, but its pathogenesis remains unclear, complicating diagnosis and treatment. The invasive nature of renal biopsy highlights the need for non-invasive diagnostic biomarkers. Bulk RNA sequencing (RNA-seq) of urine offers a promising approach for identifying molecular changes relevant to IgAN.

View Article and Find Full Text PDF

Pancreatic ductal adenocarcinoma (PDAC) is a drug resistant and lethal cancer. Identification of the genes that consistently show altered expression across patients' cohorts can expose effective therapeutic targets and strategies. To identify such genes, we separately analyzed five human PDAC microarray datasets.

View Article and Find Full Text PDF

Background: The refinement of risk stratification in lung adenocarcinoma (LUAD) plays a pivotal role in advancing precision medicine; however, the current staging classification falls short of comprehensiveness, particularly in the case of stage IA patients. We aimed to molecularly stratify LUAD patients especially for stage IA.

Methods: We analysed tumour heterogeneity and identified highly proliferating cancer cells (HPCs) in LUAD by performing single-cell RNA sequencing (scRNA-seq) analysis, immunohistochemical (IHC) staining using a tissue microarray, flow cytometry and biological experiments.

View Article and Find Full Text PDF

Integrative Multi-Omics Analysis Reveals Critical Molecular Networks Linking Intestinal-System Diseases to Colorectal Cancer Progression.

Biomedicines

November 2024

Suzhou Research Center of Medical School, Suzhou Hospital, Affiliated Hospital of Medical School, Nanjing University, Suzhou 215163, China.

: Colorectal cancer (CRC) frequently co-occurs with intestinal system diseases (ISDs), yet their molecular interplay remains poorly understood. We employed a comprehensive bioinformatics approach to elucidate shared genetic signatures and pathways between CRC and ISDs. : We systematically analyzed 12 microarray and RNA-seq datasets encompassing 989 samples across seven ISDs and CRC.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!