Background: Outliers, data points that significantly deviate from the norm, can have a substantial impact on statistical inference and provide valuable insights in data analysis. Multiple methods have been developed for outlier detection, however, almost all available approaches fail to consider the spatial dependence and heterogeneity in spatial data. Spatial data has diverse formats and semantics, requiring specialized outlier detection methodology to handle these unique properties. For now, there is limited research exists on robust spatial outlier detection methods designed specifically under the spatial error model (SEM) structure.
Method: We propose the Spatial-Θ-Iterative Procedure for Outlier Detection (Spatial-Θ-IPOD), which utilizes a mean-shift vector to identify outliers within the SEM. Our method enables an effective detection of spatial outliers while also providing robust coefficient estimates. To assess the performance of our approach, we conducted extensive simulations and applied it to a real-world empirical study using life expectancy data from multiple countries.
Results: Simulation results showed that the masking and JD (Joint Detection) indicators of our Spatial-Θ-IPOD method outperformed several commonly used methods, even in high-dimensional scenarios, demonstrating stable performance. Conversely, the Θ-IPOD method proved to be ineffective in detecting outliers when spatial correlation was present. Moreover, our model successfully provided reliable coefficient estimation alongside outlier detection. The proposed method consistently outperformed other models (both robust and non-robust) in most cases. In the empirical study, our proposed model successfully detected outliers and provided valuable insights in the modeling process.
Conclusions: Our proposed Spatial-Θ-IPOD offers an effective solution for detecting spatial outliers for SEM while providing robust coefficient estimates. Notably, our approach showcases its relative superiority even in the presence of high leverage points. By successfully identifying outliers, our method enhances the overall understanding of the data and provides valuable insights for further analysis.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323683 | PMC |
http://dx.doi.org/10.1186/s12874-024-02208-3 | DOI Listing |
PLoS One
January 2025
Department of Information Systems, College of Computer Sciences and Information Technology (CCSIT), King Faisal University, Al-Ahsa, Kingdom of Saudi Arabia.
Diabetes, a chronic metabolic condition characterised by persistently high blood sugar levels, necessitates early detection to mitigate its risks. Inadequate dietary choices can contribute to various health complications, emphasising the importance of personalised nutrition interventions. However, real-time selection of diets tailored to individual nutritional needs is challenging because of the intricate nature of foods and the abundance of dietary sources.
View Article and Find Full Text PDFJ Chem Phys
January 2025
Soft Condensed Matter & Biophysics, Debye Institute for Nanomaterials Science, Utrecht University, Princetonplein 1, 3584 CC Utrecht, The Netherlands.
Since the influential work of ten Wolde, Ruiz-Montero, and Frenkel [Phys. Rev. Lett.
View Article and Find Full Text PDFSci Rep
January 2025
Second Affiliated Hospital of Heilongjiang University of Chinese Medicine, 411 Guogeli Street, Nangang District, Heilongjiang, Harbin, 150001, China.
Major depressive disorder (MDD) and myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) frequently occur together; yet their causal relationship remains unclear. To investigate the potential genetic causal link between these conditions, we conducted a two-sample Mendelian randomization (MR) analysis. Summary data from Genome-Wide Association Studies (GWAS) for MDD were sourced from the UK Biobank and the Psychiatric Genomics Consortium, while GWAS data for ME/CFS were retrieved from the UK Biobank.
View Article and Find Full Text PDFViruses
November 2024
Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv 6997801, Israel.
In this study, we introduce a novel approach that integrates interpretability techniques from both traditional machine learning (ML) and deep neural networks (DNN) to quantify feature importance using global and local interpretation methods. Our method bridges the gap between interpretable ML models and powerful deep learning (DL) architectures, providing comprehensive insights into the key drivers behind model predictions, especially in detecting outliers within medical data. We applied this method to analyze COVID-19 pandemic data from 2020, yielding intriguing insights.
View Article and Find Full Text PDFVaccines (Basel)
December 2024
Analytical Research & Development, Merck & Co., Inc., Rahway, NJ 07065, USA.
Background/objectives: Host cell protein (HCP) content is a major attribute for biological and vaccine products that must be extensively characterized prior to product licensure. Enzyme Linked Immunosorbent Assay (ELISA) and Mass Spectrometry (MS) are conventional methods for quantitative host cell protein analysis in biologic and vaccine products. Both techniques are usually very tedious, labor-intensive, and challenging to transfer to other laboratories.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!