Multivariate binary classification of imbalanced datasets-A case study based on high-dimensional multiplex autoimmune assay data.

Biom J

Department of Mathematical Statistics with Applications in Biometrics, Faculty of Statistics, Technical University Dortmund, Vogelpothsweg 87, 44227, Dortmund, Germany.

Published: September 2017

The classification of a population by a specific trait is a major task in medicine, for example when in a diagnostic setting groups of patients with specific diseases are identified, but also when in predictive medicine a group of patients is classified into specific disease severity classes that might profit from different treatments. When the sizes of those subgroups become small, for example in rare diseases, imbalances between the classes are more the rule than the exception and make statistical classification problematic when the error rate of the minority class is high. Many observations are classified as belonging to the majority class, while the error rate of the majority class is low. This case study aims to investigate class imbalance for Random Forests and Powered Partial Least Squares Discriminant Analysis (PPLS-DA) and to evaluate the performance of these classifiers when they are combined with methods to compensate imbalance (sampling methods, cost-sensitive learning approaches). We evaluate all approaches with a scoring system taking the classification results into consideration. This case study is based on one high-dimensional multiplex autoimmune assay dataset describing immune response to antigens and consisting of two classes of patients: Rheumatoid Arthritis (RA) and Systemic Lupus Erythemathodes (SLE). Datasets with varying degrees of imbalance are created by successively reducing the class of RA patients. Our results indicate possible benefit of cost-sensitive learning approaches for Random Forests. Although further research is needed to verify our findings by investigating other datasets or large-scale simulation studies, we claim that this work has the potential to increase awareness of practitioners to this problem of class imbalance and stresses the importance of considering methods to compensate class imbalance.

Download full-text PDF

Source
http://dx.doi.org/10.1002/bimj.201600207DOI Listing

Publication Analysis

Top Keywords

case study
12
class imbalance
12
study based
8
based high-dimensional
8
high-dimensional multiplex
8
multiplex autoimmune
8
autoimmune assay
8
error rate
8
majority class
8
random forests
8

Similar Publications

Effects of urban sprawl due to migration on spatiotemporal land use-land cover change: a case study of Bartın in Türkiye.

Sci Rep

January 2025

Department of Forest Engineering, Faculty of Forestry, Kastamonu University, Kastamonu, Türkiye, Turkey.

Rapid urban growth is a subject of worldwide interest due to environmental problems. Population growth, especially migration from rural to urban areas, leads to land use and land cover (LULCC) changes in urban centres. Therefore, LULCC and urban growth analyses are among the studies that will help decision-makers achieve better sustainable management and planning.

View Article and Find Full Text PDF

In fluvial environments, the shifting of river channels and bank erosion are frequently caused by both natural and anthropogenic factors. Riverine hazards like bank erosion and course alterations offer severe issues to the riparian villages along the lower basin of the Tista River in India, which substantially influence the livelihoods of inhabitants living there. This research addressed river channel shifting tendency and identified major bank erosion-prone villages along the lower course of the Tista River and challenges to the livelihoods of the riparian people.

View Article and Find Full Text PDF

Karst small towns globally face challenges due to limited disaster-resilient resources, making it difficult to handle increasingly severe disaster environments. Improving the efficiency of disaster-resilient resource utilization and maintaining a tight balance state of disaster-resilient resources (TBS) are crucial for enhancing disaster adaptability and resilience. This study used urban and disaster data from a representative karst region in China (2017-2021) to conduct a quantitative analysis of TBS in karst small towns, exploring the mechanisms and interactions within this state and identifying obstacle factors.

View Article and Find Full Text PDF

The scientific establishment of the Ecological Security Pattern (ESP) is crucial for fostering the synergistic development of ecological and recreational functions, thereby enhancing urban ecological protection, recreational development, and sustainable growth. This study aimed to propose a novel method of constructing ESP considering both ecological and recreational functions, and to reconstruct ESP by weighing the relationship between ecological protection and recreational development. Utilizing Fuzhou City as a case study, a comprehensive application of methodologies including Morphological Spatial Pattern Analysis (MSPA), landscape connectivity analysis, ArcGIS spatial analysis, social network analysis (SNA), and circuit theory is employed to develop both the ESP and the Recreational Spatial Pattern (RSP).

View Article and Find Full Text PDF

Background: Xylazine is a α2-adrenergic receptor agonist, used for sedation in veterinary contexts. Although it is increasingly found in overdose deaths across North America, the clinical management of xylazine-involved overdoses has not been extensively studied, especially in community-based harm reduction settings. Here we present a clinical series of xylazine-involved overdose and share the clinical approach and lessons learned by a community overdose response team in Tijuana, Mexico amidst the arrival of xylazine.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!