Clustering aims at naturally grouping the data according to the underlying data distribution. The data distribution is often estimated using a parametric or nonparametric model, e.g., Gaussian mixture or kernel density estimation. Compared with nonparametric models, parametric models are statistically stable, i.e., a small perturbation of data points leads to a small change in the estimated density. However, parametric models are highly sensitive to outliers because the data distribution is far away from the parametric assumptions in the presence of outliers. Given a parametric clustering algorithm, this paper shows how to turn this algorithm into a robust one. The idea is to modify the original parametric density into a semiparametric one. The high-density data that form the core of each cluster are modeled with the original parametric density. The low-density data are often far away from the cluster cores and may have an arbitrary shape, thus are modeled using a nonparametric density. A combination of parametric and nonparametric clustering algorithms is used to group the data modeled as a semiparametric density. From the robust statistical point of view, the proposed method has good robustness properties. We test the proposed algorithm on several synthetic and 70 UCI data sets. The results indicate that the semiparametric method could significantly improve the clustering performance.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2018.2884790DOI Listing

Publication Analysis

Top Keywords

data distribution
12
parametric
9
data
9
parametric clustering
8
parametric nonparametric
8
parametric models
8
original parametric
8
parametric density
8
density
6
clustering
5

Similar Publications

Towards healthy sleep environments: Ambient, indoor, and personal exposure to PM and its implications in children's sleep health.

Environ Res

January 2025

Department of Civil, Environmental, & Architectural Engineering, Worcester Polytechnic Institute, Worcester, MA, United States. Electronic address:

The growing impact of climate change and escalating wildfire seasons has led to heightened ambient air pollution, potentially affecting children's sleep health. However, current epidemiological research often relies on outdoor weather data to model the environmental impacts on sleep health, potentially mischaracterizing the actual bedroom environment. To address these challenges, we conducted experiments to investigate the relationships among ambient, indoor, and personal exposure to PM concentrations and obstructive sleep apnea (OSA) in children.

View Article and Find Full Text PDF

Group learning contracts in healthcare education: A systematic review.

J Med Imaging Radiat Sci

January 2025

Division of Library Services, Charles Sturt University, Albury, NSW, Australia.

Introduction/background: Group work plays a crucial role in healthcare education by fostering collaboration, communication, and teamwork skills. However, students often face challenges such as unequal workload distribution, conflict, and anxiety. Group learning contracts have been introduced to improve group dynamics by setting clear expectations, enhancing accountability, and promoting effective collaboration.

View Article and Find Full Text PDF

This paper examines internal migrant selection in Italy using individual height data from the 1951 and 1980 birth cohorts of military conscripts. Information on both place of birth and residence of conscripts allows us to compare migrants' heights to the height distributions of their non-migrant peers at the national level and to their populations of origin. Results suggest that migrants from southern Italy were negatively selected at the national level, while a positive selection in height emerged if compared to conscripts who remained in their macro-area of origin.

View Article and Find Full Text PDF

Diffuse Large B-cell Lymphoma (DLBCL) is a lymphatic cancer of steadily growing incidence. Its diagnostic and follow-up rely on the analysis of clinical biomarkers and 18F-Fluorodeoxyglucose (FDG)-PET/CT images. In this context, we target the problem of assisting in the early identification of high-risk DLBCL patients from both images and tabular clinical data.

View Article and Find Full Text PDF

Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC's National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this target signal.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!