A machine learning approach for identifying novel cell type-specific transcriptional regulators of myogenesis.

PLoS Genet

Laboratory of Developmental Systems Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America.

Published: September 2012

Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA-based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type-specific developmental gene expression patterns.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3297574PMC
http://dx.doi.org/10.1371/journal.pgen.1002531DOI Listing

Publication Analysis

Top Keywords

machine learning
8
cell type-specific
8
enhancers
8
gene expression
8
phylogenetic profiling
8
number enhancers
8
tfbss
6
learning approach
4
approach identifying
4
identifying novel
4

Similar Publications

Adaptive deep feature representation learning for cross-subject EEG decoding.

BMC Bioinformatics

December 2024

College of Computer and Information Engineering/College of Artificial Intelligence, Nanjing Tech University, Nanjing, 210093, China.

Background: The collection of substantial amounts of electroencephalogram (EEG) data is typically time-consuming and labor-intensive, which adversely impacts the development of decoding models with strong generalizability, particularly when the available data is limited. Utilizing sufficient EEG data from other subjects to aid in modeling the target subject presents a potential solution, commonly referred to as domain adaptation. Most current domain adaptation techniques for EEG decoding primarily focus on learning shared feature representations through domain alignment strategies.

View Article and Find Full Text PDF

Background: Wide QRS complex tachycardia (WCT) differentiation into ventricular tachycardia (VT) and supraventricular wide complex tachycardia (SWCT) remains challenging despite numerous 12-lead electrocardiogram (ECG) criteria and algorithms. Automated solutions leveraging computerized ECG interpretation (CEI) measurements and engineered features offer practical ways to improve diagnostic accuracy. We propose automated algorithms based on (i) WCT QRS polarity direction (WCT Polarity Code [WCT-PC]) and (ii) QRS polarity shifts between WCT and baseline ECGs (QRS Polarity Shift [QRS-PS]).

View Article and Find Full Text PDF

Design of experiments (DOE) is an established method to allocate resources for efficient parameter space exploration. Model based active learning (AL) data sampling strategies have shown potential for further optimization. This paper introduces a workflow for conducting DOE comparative studies using automated machine learning.

View Article and Find Full Text PDF

Healthy ageing plays an important role in ageing societies in many countries, and centenarians are a sign of longevity. Longevity and its determinants have become issues of global concern and also a focus of research. Although many disciplines have conducted out a series of studies on longevity phenomena, few studies have systematically considered the impact of geographical environmental factors.

View Article and Find Full Text PDF

Self-supervised denoising of grating-based phase-contrast computed tomography.

Sci Rep

December 2024

Research Group Biomedical Imaging Physics, Department of Physics, TUM School of Natural Sciences, Technical University of Munich, 85748, Garching, Germany.

In the last decade, grating-based phase-contrast computed tomography (gbPC-CT) has received growing interest. It provides additional information about the refractive index decrement in the sample. This signal shows an increased soft-tissue contrast.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!