Bootstrapping with models for count data.

J Biopharm Stat

Western EcoSystems Technology, Inc., Laramie, Wyoming 82070, USA.

Published: November 2011

Two methods of bootstrap resampling are discussed with log-linear models for count data. The first involves the resampling of observations and the second involves the resampling of Pearson residuals taking into account changes in the distribution of residuals associated with the expected values of counts. The use of both methods is illustrated on two data sets; one data set concerns the number of ear infections of swimmers related to whether they are frequent swimmers or not and three other variables, and the other data set concerns the number of visits to a doctor made in the last 2 weeks related to the age of subjects and 10 other variables. A third data set on the number of marine mammal interactions in different years and fishing areas is also used as an example. In this case only the second bootstrap method can be used because the nature of the data allows the bootstrap resampling of observations to produce sets of data that could not have occurred in practice. Simulation results indicate that the bootstrap results are slightly better than the results from a conventional analysis for the first data set, and much better than the results from a conventional analysis for the second data set, but a conventional analysis works well for the third data set while there are problems with bootstrap analyses.

Download full-text PDF

Source
http://dx.doi.org/10.1080/10543406.2011.607748DOI Listing

Publication Analysis

Top Keywords

data set
24
conventional analysis
12
data
11
models count
8
count data
8
bootstrap resampling
8
involves resampling
8
resampling observations
8
sets data
8
set concerns
8

Similar Publications

Adaptive weighted progressive iterative approximation based on coordinate decomposition.

PLoS One

January 2025

School of Mathematics and Finance, Hunan University of Humanities, Science and Technology, Loudi, China.

During the iterative process of the progressive iterative approximation, it is necessary to calculate the difference between the current interpolation curve and the corresponding data points, known as the adjustment vector. To achieve more precise adjustments of control points, this paper decomposes the adjustment vector into its coordinate components and introduces a weight for each component. By dynamically adjusting these weights, we can accelerate the convergence of iterations and enhance approximation accuracy.

View Article and Find Full Text PDF

Average nucleotide identity (ANI) is a widely used metric to estimate genetic relatedness, especially in microbial species delineation. While ANI calculation has been well optimized for bacteria and closely related viral genomes, accurate estimation of ANI below 80%, particularly in large reference data sets, has been challenging due to a lack of accurate and scalable methods. To bridge this gap, we introduce MANIAC, an efficient computational pipeline optimized for estimating ANI and alignment fraction (AF) in viral genomes with divergence around ANI of 70%.

View Article and Find Full Text PDF

pLM4CPPs: Protein Language Model-Based Predictor for Cell Penetrating Peptides.

J Chem Inf Model

January 2025

Department of Grain Science and Industry, Kansas State University, Manhattan, Kansas 66506, United States.

Cell-penetrating peptides (CPPs) are short peptides capable of penetrating cell membranes, making them valuable for drug delivery and intracellular targeting. Accurate prediction of CPPs can streamline experimental validation in the lab. This study aims to assess pretrained protein language models (pLMs) for their effectiveness in representing CPPs and develop a reliable model for CPP classification.

View Article and Find Full Text PDF

Purpose: To address the extent to which Federally Qualified Health Centers (FQHCs) and independent and provider-based Rural Health Clinics (RHCs) were using telehealth prior to and during the COVID-19 pandemic.

Methods: A nationally representative 5% sample of Medicare Fee-for-Service beneficiaries who used outpatient services at FQHCs and RHCs were identified within the 2019-2021 5% Medicare Limited Data Set Outpatient and Carrier files. Rural-Urban Continuum Codes were used to identify rural-urban clinic locations.

View Article and Find Full Text PDF

This data set includes the spatial model of the thickness and distribution of fine-grained floodplain deposits in the Leipzig floodplain area. The data set originates from borehole records provided by the Saxon State Office for Environment, Agriculture, and Geology [1]. The data processing involved the categorization of the stratigraphic descriptions of the borehole logs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!