Machine learning-guided differential gene expression analysis identifies a highly-connected seven-gene cluster in triple-negative breast cancer.

Biomedicine (Taipei)

Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, University of Sadat City, Sadat City, Egypt.

Published: December 2024

Background: One of the most challenging cancers is triple-negative breast cancer, which is subdivided into many molecular subtypes. Due to the high degree of heterogeneity, the role of precision medicine remains challenging. With the use of machine learning (ML)-guided gene selection, the differential gene expression analysis can be optimized, and eventually, the process of precision medicine can see great advancement through biomarker discovery.

Purpose: Enhancing precision medicine in the oncology field by identification of the most representative differentially-expressed genes to be used as biomarkers or as novel drug targets.

Methods: By utilizing data from the Gene Expression Omnibus (GEO) repository and The Cancer Genome Atlas (TCGA), we identified the differentially expressed genes using the linear model for microarray analysis (LIMMA) and edgeR algorithms, and applied ML-based feature selection using several algorithms.

Results: A total of 27 genes were selected by merging features identified with both LIMMA and ML-based feature selection methods. The models with the highest area under the curve (AUC) are CatBoost, Extreme Gradient Boosting (XGBoost), Random Forest, and Multi-Layer Perceptron classifiers. ESR1, FOXA1, GATA3, XBP1, GREB1, AR, and AGR2 were identified as hub genes in a highly interconnected cluster.

Conclusion: ML-based gene selection shows a great impact on the identification of hub genes. The ML models built can improve precision oncology in diagnosis and prognosis. The identified hub genes can serve as biomarkers and warrant further research for potential drug target development.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11703398PMC
http://dx.doi.org/10.37796/2211-8039.1467DOI Listing

Publication Analysis

Top Keywords

gene expression
12
precision medicine
12
hub genes
12
differential gene
8
expression analysis
8
triple-negative breast
8
breast cancer
8
gene selection
8
ml-based feature
8
feature selection
8

Similar Publications

Single-cell RNA sequencing (scRNA-seq) offers remarkable insights into cellular development and differentiation by capturing the gene expression profiles of individual cells. The role of dimensionality reduction and visualization in the interpretation of scRNA-seq data has gained widely acceptance. However, current methods face several challenges, including incomplete structure-preserving strategies and high distortion in embeddings, which fail to effectively model complex cell trajectories with multiple branches.

View Article and Find Full Text PDF

scMMAE: masked cross-attention network for single-cell multimodal omics fusion to enhance unimodal omics.

Brief Bioinform

November 2024

Guangdong Provincial Key Laboratory of Mathematical and Neural Dynamical Systems, Great Bay University, No. 16 Daxue Rd, Songshanhu District, Dongguan, Guangdong, 523000, China.

Multimodal omics provide deeper insight into the biological processes and cellular functions, especially transcriptomics and proteomics. Computational methods have been proposed for the integration of single-cell multimodal omics of transcriptomics and proteomics. However, existing methods primarily concentrate on the alignment of different omics, overlooking the unique information inherent in each omics type.

View Article and Find Full Text PDF

Background: Protein-truncating mutations in the titin gene are associated with increased risk of atrial fibrillation. However, little is known about the underlying pathophysiology.

Methods: We identified a heterozygous titin truncating variant (TTNtv) in a patient with unexplained early onset atrial fibrillation and normal ventricular function.

View Article and Find Full Text PDF

Adeno-associated viral (AAV) vectors are increasingly used for preclinical and clinical cardiac gene therapy approaches. However, gene transfer to cardiomyocytes poses a challenge due to differences between AAV serotypes in terms of expression efficiency and . For example, AAV9 vectors work well in rodent heart muscle cells but not in cultivated neonatal rat ventricular cardiomyocytes (NRVCMs), necessitating the use of AAV6 vectors for studies.

View Article and Find Full Text PDF

Although not essential for their growth, the production of secondary metabolites increases the fitness of the producing microorganisms in their natural habitat by enhancing establishment, competition, and nutrient acquisition. The Gram-positive soil-dwelling bacterium, , produces a variety of secondary metabolites. Here, we investigated the regulatory relationship between the non-ribosomal peptide surfactin and the sactipeptide bacteriocin subtilosin A.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!