SEMtree: tree-based structure learning methods with structural equation models.

Bioinformatics

Department of Brain and Behavioral Sciences, University of Pavia, Pavia 27100, Italy.

Published: June 2023

Motivation: With the exponential growth of expression and protein-protein interaction (PPI) data, the identification of functional modules in PPI networks that show striking changes in molecular activity or phenotypic signatures becomes of particular interest to reveal process-specific information that is correlated with cellular or disease states. This requires both the identification of network nodes with reliability scores and the availability of an efficient technique to locate the network regions with the highest scores. In the literature, a number of heuristic methods have been suggested. We propose SEMtree(), a set of tree-based structure discovery algorithms, combining graph and statistically interpretable parameters together with a user-friendly R package based on structural equation models framework.

Results: Condition-specific changes from differential expression and gene-gene co-expression are recovered with statistical testing of node, directed edge, and directed path difference between groups. In the end, from a list of seed (i.e. disease) genes or gene P-values, the perturbed modules with undirected edges are generated with five state-of-the-art active subnetwork detection methods. The latter are supplied to causal additive trees based on Chu-Liu-Edmonds' algorithm (Chow and Liu, Approximating discrete probability distributions with dependence trees. IEEE Trans Inform Theory 1968;14:462-7) in SEMtree() to be converted in directed trees. This conversion allows to compare the methods in terms of directed active subnetworks. We applied SEMtree() to both Coronavirus disease (COVID-19) RNA-seq dataset (GEO accession: GSE172114) and simulated datasets with various differential expression patterns. Compared to existing methods, SEMtree() is able to capture biologically relevant subnetworks with simple visualization of directed paths, good perturbation extraction, and classifier performance.

Availability And Implementation: SEMtree() function is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287946	PMC
http://dx.doi.org/10.1093/bioinformatics/btad377	DOI Listing

Publication Analysis

Top Keywords

tree-based structure

structural equation

equation models

differential expression

semtree

methods

directed

semtree tree-based

structure learning

learning methods

Similar Publications

StackDILI: Enhancing Drug-Induced Liver Injury Prediction through Stacking Strategy with Effective Molecular Representations.

J Chem Inf Model

January 2025

Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China.

Jiahui Guan Danhong Dong Peilin Xie Zhihao Zhao Yilin Guo

Drug-induced liver injury (DILI) is a major challenge in drug development, often leading to clinical trial failures and market withdrawals due to liver toxicity. This study presents StackDILI, a computational framework designed to accelerate toxicity assessment by predicting DILI risk. StackDILI integrates multiple molecular descriptors to extract structural and physicochemical features, including the constitution, pharmacophore, MACCS, and E-state descriptors.

View Article and Find Full Text PDF

Similar Publications

Prediction of Pt, Ir, Ru, and Rh complexes light absorption in the therapeutic window for phototherapy using machine learning.

J Cheminform

January 2025

PROMOCS Laboratory, Department of Chemistry and Chemical Technologies, University of Calabria, Arcavacata di Rende (CS), Italy.

V Vigna T F G G Cova A A C C Pais E Sicilia

Effective light-based cancer treatments, such as photodynamic therapy (PDT) and photoactivated chemotherapy (PACT), rely on compounds that are activated by light efficiently, and absorb within the therapeutic window (600-850 nm). Traditional prediction methods for these light absorption properties, including Time-Dependent Density Functional Theory (TDDFT), are often computationally intensive and time-consuming. In this study, we explore a machine learning (ML) approach to predict the light absorption in the region of the therapeutic window of platinum, iridium, ruthenium, and rhodium complexes, aiming at streamlining the screening of potential photoactivatable prodrugs.

View Article and Find Full Text PDF

Similar Publications

HLA-EpiCheck: novel approach for HLA B-cell epitope prediction using 3D-surface patch descriptors derived from molecular dynamic simulations.

Bioinform Adv

December 2024

LORIA, Université de Lorraine, CNRS, INRIA, Nancy 54000, France.

Diego Amaya-Ramirez Magali Devriese Romain Lhotte Cédric Usureau Malika Smaïl-Tabbone

Motivation: The human leukocyte antigen (HLA) system is the main cause of organ transplant loss through the recognition of HLAs present on the graft by donor-specific antibodies raised by the recipient. It is therefore of key importance to identify all potentially immunogenic B-cell epitopes on HLAs in order to refine organ allocation. Such HLAs epitopes are currently characterized by the presence of polymorphic residues called "eplets".

View Article and Find Full Text PDF

Similar Publications

The complete chloroplast genome and phylogenetic analysis of Miq 1861 (Roseaceae).

Mitochondrial DNA B Resour

December 2024

College of Horticulture and Landscape Architecture, Zhongkai University of Agriculture and Engineering, Guangzhou, Guangdong, China.

Wei Guo Longyuan Wang Wei Wu

The wild raspberry species Miq 1861 is a promising resource for breeding thermotolerant cultivars. Its complete chloroplast genome spans 155,935 base pairs (bp), featuring the classic quadripartite structure: an 18,729 bp small single-copy region, an 85,662 bp large single-copy region, and two 25,772 bp inverted repeats. A total of 130 genes were identified, including 86 protein-coding, 36 tRNA genes, and 8 rRNA genes.

View Article and Find Full Text PDF

Similar Publications

Leveraging machine learning for the detection of structured interference in Global Navigation Satellite Systems.

PeerJ Comput Sci

November 2024

Industrial Engineering Department, College of Engineering, King Saud University, Riyadh, Saudi Arabia.

Imtiaz Nabi Salma Zainab Farooq Sunnyaha Saeed Syed Ali Irtaza Khurram Shehzad

Radio frequency interference disrupts services offered by Global Navigation Satellite Systems (GNSS). Spoofing is the transmission of structured interference signals intended to deceive GNSS location and timing services. The identification of spoofing is vital, especially for safety-of-life aviation services, since the receiver is unaware of counterfeit signals.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!