Motivation: Tumor trees, which depict the evolutionary process of cancer, provide a backbone for discovering recurring evolutionary processes in cancer. While they are not the primary information extracted from genomic data, they are valuable for this purpose. One such extraction method involves summarizing multiple trees into a single representative tree, such as consensus trees or supertrees.

Results: We define the "weighted centroid tree problem" to find the centroid tree of a set of single-labeled rooted trees through the following steps: (i) mapping the given trees into the Euclidean space, (ii) computing the weighted centroid matrix of the mapped trees, and (iii) finding the nearest mapped tree (NMTP) to the centroid matrix. We show that this setup encompasses previously studied parent-child and ancestor-descendent metrics as well as the GraPhyC and TuELiP consensus tree algorithms. Moreover, we show that, while the NMTP problem is polynomial-time solvable for the adjacency embedding, it is NP-hard for ancestry and distance mappings. We introduce integer linear programs for NMTP in different setups where we also provide a new algorithm for the case of ancestry embedding called 2-AncL2, that uses a novel weighting scheme for ancestry signals. Our experimental results show that 2-AncL2 has a superior performance compared to available consensus tree algorithms. We also illustrate our setup's application on providing representative trees for a large real breast cancer dataset, deducing that the cluster centroid trees summarize reliable evolutionary information about the original dataset.

Availability And Implementation: https://github.com/vasei/WAncILP.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520232PMC
http://dx.doi.org/10.1093/bioinformatics/btae120DOI Listing

Publication Analysis

Top Keywords

trees
9
weighted centroid
8
centroid trees
8
centroid tree
8
centroid matrix
8
consensus tree
8
tree algorithms
8
tree
7
centroid
5
trees general
4

Similar Publications

The two sides of Phobos: Gray and white matter abnormalities in phobic individuals.

Cogn Affect Behav Neurosci

January 2025

Departamento de Psicología ClínicaPsicobiología y MetodologíaFacultad de Psicología, Universidad de La Laguna, La Laguna, 38200, Tenerife, Spain.

Small animal phobia (SAP) is a subtype of specific phobia characterized by an intense and irrational fear of small animals, which has been underexplored in the neuroscientific literature. Previous studies often faced limitations, such as small sample sizes, focusing on only one neuroimaging modality, and reliance on univariate analyses, which produced inconsistent findings. This study was designed to overcome these issues by using for the first time advanced multivariate machine-learning techniques to identify the neural mechanisms underlying SAP.

View Article and Find Full Text PDF

Golden camellia species are endangered species with great ecological significance and economic value in the section Chrysantha of the genus Camellia of the family Theaceae. Literature shows that more than 50 species of golden camellia have been found all over the world, but the exact number remains undetermined due to the complex phylogenetic background, the non-uniform classification criteria, and the presence of various synonyms and homonyms; and phylogenetic relationships among golden camellia species at the gene level are yet to be disclosed. Therefore, it is necessary to investigate the divergence time and phylogenetic relationships between all golden camellia species at the gene level to improve their classification system and achieve accurate identification of them.

View Article and Find Full Text PDF

The chloroplast (cp) genome is a widely used tool for exploring plant evolutionary relationships, yet its effectiveness in fully resolving these relationships remains uncertain. Integrating cp genome data with nuclear DNA information offers a more comprehensive view but often requires separate datasets. In response, we employed the same raw read sequencing data to construct cp genome-based trees and nuclear DNA phylogenetic trees using Read2Tree, a cost-efficient method for extracting conserved nuclear gene sequences from raw read data, focusing on the Aurantioideae subfamily, which includes Citrus and its relatives.

View Article and Find Full Text PDF

Groundwater resources constitute one of the primary sources of freshwater in semi-arid and arid climates. Monitoring the groundwater quality is an essential component of environmental management. In this study, a comprehensive comparison was conducted to analyze the performance of nine ensembles and regular machine learning (ML) methods in predicting two water quality parameters including total dissolved solids (TDS) and pH, in an area with semi-arid climate conditions.

View Article and Find Full Text PDF

The maximum power delivered by a photovoltaic system is greatly influenced by atmospheric conditions such as irradiation and temperature and by surrounding objects like trees, raindrops, tall buildings, animal droppings, and clouds. The partial shading caused by these surrounding objects and the rapidly changing atmospheric parameters make maximum power point tracking (MPPT) challenging. This paper proposes a hybrid MPPT algorithm that combines the benefits of the salp swarm algorithm (SSA) and hill climbing (HC) techniques.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!