In disease modeling, a key statistical problem is the estimation of lower and upper tail probabilities of health events from given data sets of small size and limited range. Assuming such constraints, we describe a computational framework for the systematic fusion of observations from multiple sources to compute tail probabilities that could not be obtained otherwise due to a lack of lower or upper tail data. The estimation of multivariate lower and upper tail probabilities from a given small reference data set that lacks complete information about such tail data is addressed in terms of pertussis case count data.
View Article and Find Full Text PDFSynthetic data, when properly used, can enhance patterns in real data and thus provide insights into different problems. Here, the estimation of tail probabilities of rare events from a moderately large number of observations is considered. The problem is approached by a large number of augmentations or fusions of the real data with computer-generated synthetic samples.
View Article and Find Full Text PDFOften in food safety and bio-surveillance it is desirable to estimate the probability that a contaminant or a function thereof exceeds an unsafe high threshold. The probability or chance in question is very small. To estimate such a probability, we need information about large values.
View Article and Find Full Text PDFIn US states with small subpopulations, the observed mortality rates are often zero, particularly among young ages. Because in life tables, death rates are reported mostly on a log scale, zero mortality rates are problematic. To overcome the observed zero death rates problem, appropriate probability models are used.
View Article and Find Full Text PDFThe probability that mortality from certain causes exceeds high thresholds is addressed. An out-of-sample fusion method is presented where an original real data sample is fused or combined with independent computer-generated samples in the estimation of exceedance probabilities assuming a density ratio model. Since the size of the combined sample of real and artificial data is larger than that of the real sample, the fused sample produces short confidence intervals relative to traditional methods.
View Article and Find Full Text PDFInt J Health Geogr
December 2009
Background: A semiparametric density ratio method which borrows strength from two or more samples can be applied to moving window of variable size in cluster detection. The method requires neither the prior knowledge of the underlying distribution nor the number of cases before scanning. In this paper, the semiparametric cluster detection procedure is combined with Storey's q-value, a type of controlling false discovery rate (FDR) method, to take into account the multiple testing problem induced by the overlapping scanning windows.
View Article and Find Full Text PDFBivariate semiparametric inference based on a two-dimensional density ratio model is discussed and applied in testing the significance of risk factors regarding testicular germ cell tumors. The results from the joint analysis of height and weight data from a case-control study show that jointly these two factors are significant, whereas body mass index, which is a function of height and weight, is not a significant risk factor.
View Article and Find Full Text PDFEscherichia coli K (JM109) and E. coli B (BL21) are strains used routinely for recombinant protein production. These two strains grow and respond differently to environmental factors such as glucose and oxygen concentration.
View Article and Find Full Text PDF