Publications by authors named "Jaime Lynn Speiser"

Background: Missing data in electronic health records are highly prevalent and result in analytical concerns such as heterogeneous sources of bias and loss of statistical power. One simple analytic method for addressing missing or unknown covariate values is to treat missingness for a particular variable as a category onto itself, which we refer to as the missing indicator method. For cross-sectional analyses, recent work suggested that there was minimal benefit to the missing indicator method; however, it is unclear how this approach performs in the setting of longitudinal data, in which correlation among clustered repeated measures may be leveraged for potentially improved model performance.

View Article and Find Full Text PDF

Random forest (RF) regression is popular machine learning method to develop prediction models for continuous outcomes. Variable selection, also known as feature selection or reduction, involves selecting a subset of predictor variables for modeling. Potential benefits of variable selection are methodologic (i.

View Article and Find Full Text PDF

Introduced in 2010, the subdiscipline of gerontologic biostatistics was conceptualized to address the specific challenges of analyzing data from clinical research studies involving older adults. Since then, the evolving technological landscape has led to a proliferation of advancements in biostatistics and other data sciences that have significantly influenced the practice of gerontologic research, including studies beyond the clinic. Data science is the field at the intersection of statistics and computer science, and although the term "data science" was not widely used in 2010, the field has quickly made palpable effects on gerontologic research.

View Article and Find Full Text PDF

Background: Composite time-to-event endpoints are beneficial for assessing related outcomes jointly in clinical trials, but components of the endpoint may have different censoring mechanisms. For example, in the PRagmatic EValuation of evENTs And Benefits of Lipid-lowering in oldEr adults (PREVENTABLE) trial, the composite outcome contains one endpoint that is right censored (all-cause mortality) and two endpoints that are interval censored (dementia and persistent disability). Although Cox regression is an established method for time-to-event outcomes, it is unclear how models perform under differing component-wise censoring schemes for large clinical trial data.

View Article and Find Full Text PDF

Background: Little is known about the relationship between lipoprotein (a) [Lp(a)] and high-sensitivity C-reactive protein (hsCRP) and their joint association with atherosclerotic cardiovascular disease (ASCVD).

Objectives: The purpose of this study was to assess whether Lp(a)-associated ASCVD risk is modified by hsCRP in the context of primary prevention.

Methods: The current study included 4,679 participants from the MESA (Multi-Ethnic Study of Atherosclerosis) Apolipoprotein ancillary data set.

View Article and Find Full Text PDF
Article Synopsis
  • Machine learning, particularly Binary Mixed Model (BiMM) forest, is being explored to create medical prediction models for complex datasets, which could enhance clinical decision-making by simplifying the data collection process through effective feature selection.
  • A simulation study was conducted to compare the performance of BiMM forest with feature selection methods against traditional linear mixed model techniques, specifically evaluating their efficiency and ability to accurately identify important features.
  • Results showed that BiMM forest with backward elimination generally had improved computational efficiency and comparable accuracy and predictive performance when modeling mobility disability in older adults, making it a promising approach for medical prediction modeling.
View Article and Find Full Text PDF
Article Synopsis
  • Random forest classification is a widely used machine learning technique for creating predictive models, but there's limited guidance on which variable selection methods to use for different types of datasets.
  • The study evaluates various variable selection methods using 311 online classification datasets, analyzing factors like prediction error rates, computation times, and overall model performance.
  • Findings suggest that Jiang's method and certain R package methods are the most effective for most datasets, particularly favoring R methods for datasets with many predictors due to their computational efficiency.
View Article and Find Full Text PDF

Background: Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest.

View Article and Find Full Text PDF
Article Synopsis
  • Clustered binary outcomes in clinical research often present challenges for traditional generalized linear mixed models (GLMMs), especially with complex interactions and unknown non-linear predictors.
  • The Binary Mixed Model (BiMM) tree is a new, data-driven method that integrates decision trees with GLMMs to address these challenges.
  • Simulation results indicate that BiMM tree performs as well or better in accuracy compared to standard methods, and it has been successfully applied to data from the Acute Liver Failure Study Group.
View Article and Find Full Text PDF

Clustered binary outcomes and datasets with many predictor variables are frequently encountered in clinical research (e.g. longitudinal studies).

View Article and Find Full Text PDF

Background/objective: Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF) patients during the first week of hospitalization often presents significant challenges. Current models such as the King's College Criteria (KCC) and the Acute Liver Failure Study Group (ALFSG) Prognostic Index are developed to predict outcome using only a single time point on hospital admission. Models using longitudinal data are not currently available for APAP-ALF patients.

View Article and Find Full Text PDF

Purpose: To evaluate associations between preoperative diagnosis, soft contact lens (SCL) retention and complications.

Methods: A retrospective chart review was conducted of 92 adult patients (103 eyes) who received a Boston keratoprosthesis type I at the Massachusetts's Eye and Ear Infirmary or the Flaum Eye Institute. Records were reviewed for preoperative diagnosis, SCL retention and subsequent complications.

View Article and Find Full Text PDF

Background/aim: Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF) patients often presents significant challenges. King's College (KCC) has been validated on hospital admission, but little has been published on later phases of illness. We aimed to improve determinations of prognosis both at the time of and following admission for APAP-ALF using Classification and Regression Tree (CART) models.

View Article and Find Full Text PDF

Classification of objects into pre-defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results.

View Article and Find Full Text PDF