J R Stat Soc Series B Stat Methodol
September 2024
Motivated by applications in text mining and discrete distribution inference, we test for equality of probability mass functions of groups of high-dimensional multinomial distributions. Special cases of this problem include global testing for topic models, two-sample testing in authorship attribution, and closeness testing for discrete distributions. A test statistic, which is shown to have an asymptotic standard normal distribution under the null hypothesis, is proposed.
View Article and Find Full Text PDFSubject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest.
View Article and Find Full Text PDFThe use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spurious results if not carefully addressed.
View Article and Find Full Text PDFBackground: Prior work has demonstrated how neighborhood poverty and racial composition impact racial disparities in access to the deceased donor kidney transplant waitlist, both nationally and regionally. We examined the association between neighborhood characteristics and racial disparities in time to transplant waitlist in Chicago, a diverse city with continued neighborhood segregation.
Methods: Using data from the United States Renal Data System (USRDS) and the US Census, we investigated time from dialysis initiation to kidney transplant waitlisting for African American and white patients in Chicago using cause-specific proportional hazards analyses, adjusting for individual sociodemographic and clinical characteristics, as well as neighborhood poverty and racial composition.
We propose a novel Rayleigh quotient based sparse quadratic dimension reduction method-named QUADRO (Quadratic Dimension Reduction via Rayleigh Optimization)-for analyzing high-dimensional data. Unlike in the linear setting where Rayleigh quotient optimization coincides with classification, these two problems are very different under nonlinear settings. In this paper, we clarify this difference and show that Rayleigh quotient optimization may be of independent scientific interests.
View Article and Find Full Text PDF