The metazoan genome is compartmentalized in areas of highly interacting chromatin known as topologically associating domains (TADs). TADs are demarcated by boundaries mostly conserved across cell types and even across species. However, a genome-wide characterization of TAD boundary strength in mammals is still lacking.
View Article and Find Full Text PDFWe develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton-Watson processes.
View Article and Find Full Text PDFGenome-wide association studies have revealed individual genetic variants associated with phenotypic traits such as disease risk and gene expressions. However, detecting pairwise interaction effects of genetic variants on traits still remains a challenge due to a large number of combinations of variants (∼10(11) SNP pairs in the human genome), and relatively small sample sizes (typically <10(4)). Despite recent breakthroughs in detecting interaction effects, there are still several open problems, including: (1) how to quickly process a large number of SNP pairs, (2) how to distinguish between true signals and SNPs/SNP pairs merely correlated with true signals, (3) how to detect nonlinear associations between SNP pairs and traits given small sample sizes, and (4) how to control false positives.
View Article and Find Full Text PDFAccurate prediction of complex traits based on whole-genome data is a computational problem of paramount importance, particularly to plant and animal breeders. However, the number of genetic markers is typically orders of magnitude larger than the number of samples (p >> n), amongst other challenges. We assessed the effectiveness of a diverse set of state-of-the-art methods on publicly accessible real data.
View Article and Find Full Text PDF