Publications by authors named "Grace Y Yi"

In causal inference, the estimation of the average treatment effect is often of interest. For example, in cancer research, an interesting question is to assess the effects of the chemotherapy treatment on cancer, with the information of gene expressions taken into account. Two crucial challenges in this analysis involve addressing measurement error in gene expressions and handling noninformative gene expressions.

View Article and Find Full Text PDF

Trivariate joint modeling for longitudinal count data, recurrent events, and a terminal event for family data has increased interest in medical studies. For example, families with Lynch syndrome (LS) are at high risk of developing colorectal cancer (CRC), where the number of polyps and the frequency of colonoscopy screening visits are highly associated with the risk of CRC among individuals and families. To assess how screening visits influence polyp detection, which in turn influences time to CRC, we propose a clustered trivariate joint model.

View Article and Find Full Text PDF

Research on dynamic treatment regimes has enticed extensive interest. Many methods have been proposed in the literature, which, however, are vulnerable to the presence of misclassification in covariates. In particular, although Q-learning has received considerable attention, its applicability to data with misclassified covariates is unclear.

View Article and Find Full Text PDF

While the impact of the COVID-19 pandemic has been widely studied, relatively fewer discussions about the sentimental reaction of the public are available. In this article, we scrape COVID-19 related tweets on the microblogging platform, Twitter, and examine the tweets from February 24, 2020 to October 14, 2020 in four Canadian cities (Toronto, Montreal, Vancouver, and Calgary) and four U.S.

View Article and Find Full Text PDF

In the framework of causal inference, the inverse probability weighting estimation method and its variants have been commonly employed to estimate the average treatment effect. Such methods, however, are challenged by the presence of irrelevant pre-treatment variables and measurement error. Ignoring these features and naively applying the usual inverse probability weighting estimation procedures may typically yield biased inference results.

View Article and Find Full Text PDF

The coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has spread stealthily and presented a tremendous threat to the public. It is important to investigate the transmission dynamics of COVID-19 to help understand the impact of the disease on public health and the economy. In this article, we develop a new epidemic model that utilizes a set of ordinary differential equations with unknown parameters to delineate the transmission process of COVID-19.

View Article and Find Full Text PDF

Providing sensible estimates of the mean incubation time for COVID-19 is important yet complex. This study aims to provide synthetic estimates of the mean incubation time of COVID-19 by capitalizing on available estimates reported in the literature and exploring different ways to accommodate heterogeneity involved in the reported studies. Online databases between January 1, 2020 and May 20, 2021 are first searched to obtain estimates of the mean incubation time of COVID-19, and meta-analyses are then conducted to generate synthetic estimates.

View Article and Find Full Text PDF

Zero-inflated count data arise frequently from genomics studies. Analysis of such data is often based on a mixture model which facilitates excess zeros in combination with a Poisson distribution, and various inference methods have been proposed under such a model. Those analysis procedures, however, are challenged by the presence of measurement error in count responses.

View Article and Find Full Text PDF

Autoregressive (AR) models are useful in time series analysis. Inferences under such models are distorted in the presence of measurement error, a common feature in applications. In this article, we establish analytical results for quantifying the biases of the parameter estimation in AR models if the measurement error effects are neglected.

View Article and Find Full Text PDF

Research of complex associations between a gene network and multiple responses has attracted increasing attention. A great challenge in analyzing genetic data is posited by the presence of the genetic network that is typically unknown. Moreover, mismeasurement of responses introduces additional complexity to distort usual inferential procedures.

View Article and Find Full Text PDF

Background: The coronavirus disease 2019 (COVID-19) pandemic has posed a significant influence on public mental health. Current efforts focus on alleviating the impacts of the disease on public health and the economy, with the psychological effects due to COVID-19 relatively ignored. In this research, we are interested in exploring the quantitative characterization of the pandemic impact on public mental health by studying an online survey dataset of the United States.

View Article and Find Full Text PDF

We consider accelerated failure time models with error-prone time-to-event outcomes. The proposed models extend the conventional accelerated failure time model by allowing time-to-event responses to be subject to measurement errors. We describe two measurement error models, a logarithm transformation regression measurement error model and an additive error model with a positive increment, to delineate possible scenarios of measurement error in time-to-event outcomes.

View Article and Find Full Text PDF

Unlabelled: To confine the spread of an infectious disease, setting a sensible quarantine time is crucial. To this end, it is imperative to well understand the distribution of incubation times of the disease. Regarding the ongoing COVID-19 pandemic, 14-days is commonly taken as a quarantine time to curb the virus spread in balancing the impacts of COVID-19 on diverse aspects of the society, including public health, economy, and humanity perspectives, etc.

View Article and Find Full Text PDF

Data with a huge size present great challenges in modeling, inferences, and computation. In handling big data, much attention has been directed to settings with "large p small n", and relatively less work has been done to address problems with p and n being both large, though data with such a feature have now become more accessible than before, where p represents the number of variables and n stands for the sample size. The big volume of data does not automatically ensure good quality of inferences because a large number of unimportant variables may be collected in the process of gathering informative variables.

View Article and Find Full Text PDF

Bivariate responses with mixed continuous and binary variables arise commonly in applications such as clinical trials and genetic studies. Statistical methods based on jointly modeling continuous and binary variables have been available. However, such methods ignore the effects of response mismeasurement, a ubiquitous feature in applications.

View Article and Find Full Text PDF

Background: Since March 11, 2020 when the World Health Organization (WHO) declared the COVID-19 pandemic, the number of infected cases, the number of deaths, and the number of affected countries have climbed rapidly. To understand the impact of COVID-19 on public health, many studies have been conducted for various countries. To complement the available work, in this article we examine Canadian COVID-19 data for the period of March 18, 2020 to August 16, 2020 with the aim to forecast the dynamic trend in a short term.

View Article and Find Full Text PDF

In genetic association studies, mixed effects models have been widely used in detecting the pleiotropy effects which occur when one gene affects multiple phenotype traits. In particular, bivariate mixed effects models are useful for describing the association of a gene with a continuous trait and a binary trait. However, such models are inadequate to feature the data with response mismeasurement, a characteristic that is often overlooked.

View Article and Find Full Text PDF

In survival data analysis, the Cox proportional hazards (PH) model is perhaps the most widely used model to feature the dependence of survival times on covariates. While many inference methods have been developed under such a model or its variants, those models are not adequate for handling data with complex structured covariates. High-dimensional survival data often entail several features: (1) many covariates are inactive in explaining the survival information, (2) active covariates are associated in a network structure, and (3) some covariates are error-contaminated.

View Article and Find Full Text PDF

The coronavirus disease-2019 (COVID-19) has been found to be caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, comprehensive knowledge of COVID-19 remains incomplete and many important features are still unknown. This manuscript conducts a meta-analysis and a sensitivity study to answer the questions: What is the basic reproduction number? How long is the incubation time of the disease on average? What portion of infections are asymptomatic? And ultimately, what is the case fatality rate? Our studies estimate the basic reproduction number to be 3.

View Article and Find Full Text PDF

Causal inference has been widely conducted in various fields and many methods have been proposed for different settings. However, for noisy data with both mismeasurements and missing observations, those methods often break down. In this paper, we consider a problem that binary outcomes are subject to both missingness and misclassification, when the interest is in estimation of the average treatment effects (ATE).

View Article and Find Full Text PDF

Inverse-probability-of-treatment weighted (IPTW) estimation has been widely used to consistently estimate the causal parameters in marginal structural models, with time-dependent confounding effects adjusted for. Just like other causal inference methods, the validity of IPTW estimation typically requires the crucial condition that all variables are precisely measured. However, this condition, is often violated in practice due to various reasons.

View Article and Find Full Text PDF

It is well established that measurement error has drastically negative impact on data analysis. It can not only bias parameter estimates but may also cause loss of power for testing relationship between variables. Although survival analysis of error-contaminated data has attracted extensive interest, relatively little attention has been paid to dealing with survival data with error-contaminated covariates when the underlying population is characterized by a cured fraction.

View Article and Find Full Text PDF

In survival analysis, accelerated failure time models are useful in modeling the relationship between failure times and the associated covariates, where covariate effects are assumed to appear in a linear form in the model. Such an assumption of covariate effects is, however, quite restrictive for many practical problems. To incorporate flexible nonlinear relationship between covariates and transformed failure times, we propose partially linear single index models to facilitate complex relationship between transformed failure times and covariates.

View Article and Find Full Text PDF

Measurement error and misclassification have long been a concern in many fields, including medicine, administrative health care data, epidemiology, and survey sampling. It is known that measurement error and misclassification may seriously degrade the quality of estimation and inference, and should be avoided whenever possible. However, in practice, it is inevitable that measurements contain error for a variety of reasons.

View Article and Find Full Text PDF