Gaussian mixture models (GMMs) are a popular and versatile tool for exploring heterogeneity in multivariate continuous data. Arguably the most popular way to estimate GMMs is via the expectation-maximization (EM) algorithm combined with model selection using the Bayesian information criterion (BIC). If the GMM is correctly specified, this estimation procedure has been demonstrated to have high recovery performance. However, in many situations, the data are not continuous but ordinal, for example when assessing symptom severity in medical data or modeling the responses in a survey. For such situations, it is unknown how well the EM algorithm and the BIC perform in GMM recovery. In the present paper, we investigate this question by simulating data from various GMMs, thresholding them in ordinal categories and evaluating recovery performance. We show that the number of components can be estimated reliably if the number of ordinal categories and the number of variables is high enough. However, the estimates of the parameters of the component models are biased independent of sample size. Finally, we discuss alternative modeling approaches which might be adopted for the situations in which estimating a GMM is not acceptable.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10250525 | PMC |
http://dx.doi.org/10.3758/s13428-022-01883-8 | DOI Listing |
Clinics (Sao Paulo)
January 2025
Department of Respiratory Medicine, Graduate School of Medical and Dental Sciences, Institute of Science Tokyo, Tokyo, Japan.
Background: Post-acute COVID-19 Syndrome (PACS) occurs in some COVID-19 patients long after acute infection and significantly affects patients' health. However, the mechanism by which PACS develops is unknown. Myosin light chain 9 (Myl9), produced by activated platelets, plays a role in immune dysregulation and microthrombi formation during acute COVID-19.
View Article and Find Full Text PDFFront Cell Infect Microbiol
January 2025
Department of Clinical Laboratory Medicine Center, Inner Mongolia Autonomous Region People's Hospital, Hohhot, Inner Mongolia, China.
Introduction: This study aims to utilize proteomics, bioinformatics, and machine learning algorithms to identify diagnostic biomarkers in the serum of patients with acute and chronic brucellosis.
Methods: Proteomic analysis was conducted on serum samples from patients with acute and chronic brucellosis, as well as from healthy controls. Differential expression analysis was performed to identify proteins with altered expression, while Weighted Gene Co-expression Network Analysis (WGCNA) was applied to detect co-expression modules associated with clinical features of brucellosis.
Genome Med
January 2025
Department of Epidemiology of Microbial Disease, Yale School of Public Health, 60 College Street, New Haven, CT, USA.
Background: Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we present MixInfect2, a new tool to accurately detect mixed samples from Mycobacterium tuberculosis short-read WGS data.
View Article and Find Full Text PDFSci Rep
January 2025
BioResource Research Center, RIKEN, 3-1-1, Koyadai, Tsukuba, 305-0074, Ibaraki, Japan.
Omics data provide a plethora of quantifiable information that can potentially be used to identify biomarkers targeting the physiological processes and ecological phenomena of organisms. However, omics data have not been fully utilized because current prediction methods in biomarker construction are susceptible to data multidimensionality and noise. We developed OmicSense, a quantitative prediction method that uses a mixture of Gaussian distributions as the probability distribution, yielding the most likely objective variable predicted for each biomarker.
View Article and Find Full Text PDFTau exhibits change in both spatial extent and density of pathology along the Alzheimer's disease (AD) spectrum with each aspect contributing to the overall burden of pathological tau. Nevertheless, studies using Tau PET have measured either magnitude using standardized uptake value ratios (SUVRs) or extent using number of Tau+ regions. We hypothesized that combining these two dimensions into a single measure of Magnitude and eXtent, Tau-MaX, would provide improved quantification of global tau burden as well as allowing for a region-agnostic measure of global tau burden that does not require a pre-specified region of interest (ROI) or meta-ROI.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!