Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework.

Brief Bioinform

Monash Biomedicine Discovery Institute, Monash University, Australia. He is also affiliated with the Monash Centre for Data Science, Faculty of Information Technology, Monash University. His research interests include bioinformatics, computational biology, machine learning, data mining, and pattern recognition.

Published: March 2021

Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing 'Black-box' approaches that are unable to reveal causal relationships from large amounts of initially encoded features.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7986616PMC
http://dx.doi.org/10.1093/bib/bbaa049DOI Listing

Publication Analysis

Top Keywords

general specific
12
specific types
12
types promoters
12
escherichia coli
8
promoters
5
computational prediction
4
prediction interpretation
4
interpretation general
4
types
4
promoters escherichia
4

Similar Publications

Characterisation of a Betasatellite Associated With Tomato Yellow Leaf Curl Guangdong Virus and Discovery of an Unusual Modulation of Virus Infection Associated With C4 Protein.

Mol Plant Pathol

January 2025

Guangdong Provincial Key Laboratory of High Technology for Plant Protection, Plant Protection Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China.

Tomato yellow leaf curl Guangdong virus (TYLCGdV), a monopartite begomovirus first identified in 2004, remains poorly characterised. In this study, we demonstrate that TYLCGdV associates with a betasatellite, TYLCGdB, and the βC1 protein encoded by TYLCGdB is essential for symptom development. We also explore the role of TYLCGdV C4 protein by generating a C4-deficient infectious clone (TYLCGdV), revealing a dynamic role for TYLCGdV C4.

View Article and Find Full Text PDF

Effects of population aging on quality of life and disease burden: a population-based study.

Glob Health Res Policy

January 2025

Center for Public Health and Epidemic Preparedness and Response, Peking University, Haidian District, 38Th Xueyuan Road, Beijing, 100191, China.

Background: As population aging intensifies, it becomes increasingly important to elucidate the casual relationship between aging and changes in population health. Therefore, our study proposed to develop a systematic attribution framework to comprehensively evaluate the health impacts of population aging.

Methods: We used health-adjusted life expectancy (HALE) to measure quality of life and disability-adjusted life years (DALY) to quantify the burden of disease for the population of Guangzhou.

View Article and Find Full Text PDF

Background: The pathogenesis of non-alcoholic fatty liver disease (NAFLD) with a global prevalence of 30% is multifactorial and the involvement of gut bacteria has been recently proposed. However, finding robust bacterial signatures of NAFLD has been a great challenge, mainly due to its co-occurrence with other metabolic diseases.

Results: Here, we collected public metagenomic data and integrated the taxonomy profiles with in silico generated community metabolic outputs, and detailed clinical data, of 1206 Chinese subjects w/wo metabolic diseases, including NAFLD (obese and lean), obesity, T2D, hypertension, and atherosclerosis.

View Article and Find Full Text PDF

Background: Several approaches are being explored for engineering off-the-shelf chimeric antigen receptor (CAR) T cells. In this study, we engineered chimeric Fcγ receptor (FcγR) T cells and tested their potential as a versatile platform for universal T cell therapy.

Methods: Chimeric FcγR (CFR) constructs were generated using three distinct forms of FcγR, namely CD16A, CD32A, and CD64.

View Article and Find Full Text PDF

Background: Current diagnostic imaging modalities have limited ability to differentiate between malignant and benign pancreaticobiliary disease, and lack accuracy in detecting lymph node metastases. F-Prostate-Specific Membrane Antigen (PSMA) PET/CT is an imaging modality used for staging of prostate cancer, but has incidentally also identified PSMA-avid pancreatic lesions, histologically characterized as pancreatic ductal adenocarcinoma (PDAC). This phase I/II study aimed to assess the feasibility of F-PSMA PET/CT to detect PDAC.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!