Publications by authors named "William Fithian"

Article Synopsis
  • In vivo T cell screens are essential for understanding immunity, but there's currently no agreed-upon design for their implementation, including factors like gene library size, the amount of cells transferred, and the number of mice used in experiments.
  • The Framework for In vivo T cell Screens (FITS) is introduced to standardize these parameters, ensuring robust and effective experimental outcomes across various contexts.
  • As a practical application, the researchers used FITS to enhance a CD8+ T cell screen in a tumor model, incorporating unique molecular identifiers (UMIs) to boost statistical analysis and monitor T cell behavior linked to different gene knockouts.
View Article and Find Full Text PDF

Motivation: Microbiome datasets provide rich information about microbial communities. However, vast library size variations across samples present great challenges for proper statistical comparisons. To deal with these challenges, rarefaction is often used in practice as a normalization technique, although there has been debate whether rarefaction should ever be used.

View Article and Find Full Text PDF

To most applied statisticians, a fitting procedure's degrees of freedom is synonymous with its model complexity, or its capacity for overfitting to data. In particular, it is often used to parameterize the bias-variance tradeoff in model selection. We argue that, on the contrary, model complexity and degrees of freedom may correspond very poorly.

View Article and Find Full Text PDF

Presence-only records may provide data on the distributions of rare species, but commonly suffer from large, unknown biases due to their typically haphazard collection schemes. Presence-absence or count data collected in systematic, planned surveys are more reliable but typically less abundant.We proposed a probabilistic model to allow for joint analysis of presence-only and survey data to exploit their complementary strengths.

View Article and Find Full Text PDF

Statistical modeling of presence-only data has attracted much recent attention in the ecological literature, leading to a proliferation of methods, including the inhomogeneous Poisson process (IPP) model, maximum entropy (Maxent) modeling of species distributions and logistic regression models. Several recent articles have shown the close relationships between these methods. We explain why the IPP intensity function is a more natural object of inference in presence-only studies than occurrence probability (which is only defined with reference to quadrat size), and why presence-only data only allows estimation of relative, and not absolute intensity of species occurrence.

View Article and Find Full Text PDF

For classification problems with significant class imbalance, subsampling can reduce computational costs at the price of inflated variance in estimating model parameters. We propose a method for subsampling efficiently for logistic regression by adjusting the class balance locally in feature space via an accept-reject scheme. Our method generalizes standard case-control sampling, using a pilot estimate to preferentially select examples whose responses are conditionally rare given their features.

View Article and Find Full Text PDF