We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton-Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as and random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10066867 | PMC |
http://dx.doi.org/10.1080/01621459.2016.1240081 | DOI Listing |
MethodsX
December 2024
Infineon Technologies, Free Trade Zone, Batu Berendam, Melaka 75350, Malaysia.
Credit card usage has surged, heightening concerns about fraud. To address this, advanced credit card fraud detection (CCFD) technology employs machine learning algorithms to analyze transaction behavior. Credit card data's complexity and imbalance can cause overfitting in conventional models.
View Article and Find Full Text PDFWaste Manag
January 2025
College of Public Administration, Nanjing Agricultural University, Nanjing 210095, China. Electronic address:
Reducing urban fine particulate matter (PM) concentrations is essential for China to achieve the Sustainable Development Goals (SDGs). Identifying the key drivers of PM will enable the development of targeted strategies to reduce PM levels. This study introduces a machine-learning model that combines CatBoost and the Tree-Structured Parzen Estimator (TPE) to analyze PM concentration across 297 cities between 2000 and 2021.
View Article and Find Full Text PDFSensors (Basel)
November 2024
Department of Electrical Engineering, Federal University of Parana, Curitiba 80242-980, PR, Brazil.
Engine fault diagnosis is a critical task in automotive aftermarket management. Developing appropriate fault-labeled datasets can be challenging due to nonlinearity variations and divergence in feature distribution among different engine kinds or operating scenarios. To solve this task, this study experimentally measures audio emission signals from compression ignition engines in different vehicles, simulating injector failures, intake hose failures, and absence of failures.
View Article and Find Full Text PDFBioinformatics
November 2024
Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States.
Motivation: Biomedical visualizations are key to accessing biomedical knowledge and detecting new patterns in large datasets. Interactive visualizations are essential for biomedical data scientists and are omnipresent in data analysis software and data portals. Without appropriate descriptions, these visualizations are not accessible to all people with blindness and low vision, who often rely on screen reader accessibility technologies to access visual information on digital devices.
View Article and Find Full Text PDFSci Total Environ
December 2024
State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing 100084, China. Electronic address:
Data generators are imperative to support design, management, scenario simulation, risk assessment, and regulatory compliance. Hybrid sewer systems struggle with accurate water quality and quantity monitoring due to variable flow patterns, missing connections, limited monitoring capacity. To accurately regenerate operational data for hybrid sewer system along the sewer shed, a visualized generator was developed to simulate wastewater quantity and quality variations within different scales in the sewer system.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!