Background: Missing data in electronic health records are highly prevalent and result in analytical concerns such as heterogeneous sources of bias and loss of statistical power. One simple analytic method for addressing missing or unknown covariate values is to treat missingness for a particular variable as a category onto itself, which we refer to as the missing indicator method. For cross-sectional analyses, recent work suggested that there was minimal benefit to the missing indicator method; however, it is unclear how this approach performs in the setting of longitudinal data, in which correlation among clustered repeated measures may be leveraged for potentially improved model performance.
View Article and Find Full Text PDFRandom forest (RF) regression is popular machine learning method to develop prediction models for continuous outcomes. Variable selection, also known as feature selection or reduction, involves selecting a subset of predictor variables for modeling. Potential benefits of variable selection are methodologic (i.
View Article and Find Full Text PDFJ Gerontol A Biol Sci Med Sci
December 2024
Introduced in 2010, the subdiscipline of gerontologic biostatistics was conceptualized to address the specific challenges of analyzing data from clinical research studies involving older adults. Since then, the evolving technological landscape has led to a proliferation of advancements in biostatistics and other data sciences that have significantly influenced the practice of gerontologic research, including studies beyond the clinic. Data science is the field at the intersection of statistics and computer science, and although the term "data science" was not widely used in 2010, the field has quickly made palpable effects on gerontologic research.
View Article and Find Full Text PDFBackground: Composite time-to-event endpoints are beneficial for assessing related outcomes jointly in clinical trials, but components of the endpoint may have different censoring mechanisms. For example, in the PRagmatic EValuation of evENTs And Benefits of Lipid-lowering in oldEr adults (PREVENTABLE) trial, the composite outcome contains one endpoint that is right censored (all-cause mortality) and two endpoints that are interval censored (dementia and persistent disability). Although Cox regression is an established method for time-to-event outcomes, it is unclear how models perform under differing component-wise censoring schemes for large clinical trial data.
View Article and Find Full Text PDFBackground: Little is known about the relationship between lipoprotein (a) [Lp(a)] and high-sensitivity C-reactive protein (hsCRP) and their joint association with atherosclerotic cardiovascular disease (ASCVD).
Objectives: The purpose of this study was to assess whether Lp(a)-associated ASCVD risk is modified by hsCRP in the context of primary prevention.
Methods: The current study included 4,679 participants from the MESA (Multi-Ethnic Study of Atherosclerosis) Apolipoprotein ancillary data set.
Background: Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest.
View Article and Find Full Text PDFCommun Stat Simul Comput
September 2018
Chemometr Intell Lab Syst
February 2019
Clustered binary outcomes and datasets with many predictor variables are frequently encountered in clinical research (e.g. longitudinal studies).
View Article and Find Full Text PDFBackground/objective: Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF) patients during the first week of hospitalization often presents significant challenges. Current models such as the King's College Criteria (KCC) and the Acute Liver Failure Study Group (ALFSG) Prognostic Index are developed to predict outcome using only a single time point on hospital admission. Models using longitudinal data are not currently available for APAP-ALF patients.
View Article and Find Full Text PDFPurpose: To evaluate associations between preoperative diagnosis, soft contact lens (SCL) retention and complications.
Methods: A retrospective chart review was conducted of 92 adult patients (103 eyes) who received a Boston keratoprosthesis type I at the Massachusetts's Eye and Ear Infirmary or the Flaum Eye Institute. Records were reviewed for preoperative diagnosis, SCL retention and subsequent complications.
Background/aim: Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF) patients often presents significant challenges. King's College (KCC) has been validated on hospital admission, but little has been published on later phases of illness. We aimed to improve determinations of prognosis both at the time of and following admission for APAP-ALF using Classification and Regression Tree (CART) models.
View Article and Find Full Text PDFClassification of objects into pre-defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results.
View Article and Find Full Text PDF