Background: Missing data frequently create problems in the analysis of population-based data sets, such as those collected by cancer registries. Restriction of analysis to records with complete data may yield inferences that are substantially different from those that would have been obtained had no data been missing. 'Naive' methods for handling missing data, such as restriction of the analysis to complete records or creation of a 'missing' category, have drawbacks that can invalidate the conclusions from the analysis. We offer a tutorial on modern methods for handling missing data in relative survival analysis.

Methods: We estimated relative survival for 29 563 colorectal cancer patients who were diagnosed between 1997 and 2004 and registered in the North West Cancer Intelligence Service. The method of multiple imputation (MI) was applied to account for the common example of incomplete stage at diagnosis, under the missing at random (MAR) assumption. Multivariable regression with a generalized linear model and Poisson error structure was then used to estimate the excess hazard of death of the colorectal cancer patients, over and above the background mortality, adjusting for significant predictors of mortality.

Results: Incomplete information on stage, morphology and grade meant that only 55% of the data could be included in the 'complete-case' analysis. All cases could be included after indicator method (IM) or MI method. Handling missing data by MI produced a significantly lower estimate of the excess mortality for stage, morphology and grade, with the largest reductions occurring for late-stage and high-grade tumours, when compared with the results of complete-case analysis.

Conclusion: In complete-case analysis, almost 50% of the information could not be included, and with the IM, all records with missing values for stage were combined into a single 'missing' category. We show that MI methods greatly improved the results by exploiting all the information in the incomplete records. This method also helped to ensure efficient inferences about survival were made from the multivariate regression analyses.

Download full-text PDF

Source
http://dx.doi.org/10.1093/ije/dyp309DOI Listing

Publication Analysis

Top Keywords

missing data
16
relative survival
12
handling missing
12
data
9
restriction analysis
8
methods handling
8
'missing' category
8
colorectal cancer
8
cancer patients
8
incomplete stage
8

Similar Publications

Metabolic syndrome (Mets) in adolescents is a growing public health issue linked to obesity, hypertension, and insulin resistance, increasing risks of cardiovascular disease and mental health problems. Early detection and intervention are crucial but often hindered by complex diagnostic requirements. This study aims to develop a predictive model using NHANES data, excluding biochemical indicators, to provide a simple, cost-effective tool for large-scale, non-medical screening and early prevention of adolescent MetS.

View Article and Find Full Text PDF

Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy.

View Article and Find Full Text PDF

The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology.

View Article and Find Full Text PDF

[Solid, endometrial-like and transitional growth patterns of ovarian high-grade serous carcinoma: a clinicopathological analysis of 25 cases].

Zhonghua Bing Li Xue Za Zhi

February 2025

Department of Pathology, the Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Gusu School, Nanjing Medical University, Suzhou 215002, China.

To investigate the clinicopathological characteristics of solid, endometrial-like and transitional (SET) cell growth subtype in high-grade serous ovarian carcinoma (HGSC). Clinical data of 25 cases of HGSC-SET were collected from January 2020 to March 2024 at the Affiliated Suzhou Hospital of Nanjing Medical University, and their histological features were analyzed. Immunohistochemical stains were used to analyze the expression of ER, PR, PAX8, WT-1, p16, p53 and Ki-67.

View Article and Find Full Text PDF

Background Context: Recumbent MRI is the most widely used image modality in people with low back pain (LBP), however, it has been proposed that upright (standing) MRI has advantages over recumbent MRI because of its ability to assess the effects of being weight-bearing. It has been suggested that this produces systematic differences in MRI parameters and differences in the correlation between MRI parameters and pain or disability in patients thus, potentially adding clinically helpful information.

Purpose: This paper aims to review and summarize the available empirical evidence for or against these two hypotheses.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!