Profiling tumors with single-cell RNA sequencing has the potential to identify recurrent patterns of transcription variation related to cancer progression, and to produce therapeutically relevant insights. However, strong intertumor heterogeneity can obscure more subtle patterns that are shared across tumors. Here we introduce a statistical method, generalized binary covariance decomposition (GBCD), to address this problem.
View Article and Find Full Text PDFCrohn's disease (CD) is a complex inflammatory bowel disease resulting from an interplay of genetic, microbial, and environmental factors. Cell-type-specific contributions to CD etiology and genetic risk are incompletely understood. Here we built a comprehensive atlas of cell-type- resolved chromatin accessibility comprising 557,310 candidate cis-regulatory elements (cCREs) in terminal ileum and ascending colon from patients with active and inactive CD and healthy controls.
View Article and Find Full Text PDFSummary: Motivated by theoretical and practical issues that arise when applying Principal component analysis (PCA) to count data, Townes et al. introduced "Poisson GLM-PCA", a variation of PCA adapted to count data, as a tool for dimensionality reduction of single-cell RNA sequencing (scRNA-seq) data. However, fitting GLM-PCA is computationally challenging.
View Article and Find Full Text PDFSummary: Motivated by theoretical and practical issues that arise when applying Principal Components Analysis (PCA) to count data, Townes et al introduced "Poisson GLM-PCA", a variation of PCA adapted to count data, as a tool for dimensionality reduction of single-cell RNA sequencing (RNA-seq) data. However, fitting GLM-PCA is computationally challenging. Here we study this problem, and show that a simple algorithm, which we call "Alternating Poisson Regression" (APR), produces better quality fits, and in less time, than existing algorithms.
View Article and Find Full Text PDFSepsis is a systemic response to infection with life-threatening consequences. Our understanding of the molecular and cellular impact of sepsis across organs remains rudimentary. Here, we characterize the pathogenesis of sepsis by measuring dynamic changes in gene expression across organs.
View Article and Find Full Text PDFParts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups.
View Article and Find Full Text PDFProfiling tumors with single-cell RNA sequencing (scRNA-seq) has the potential to identify recurrent patterns of transcription variation related to cancer progression, and produce new therapeutically relevant insights. However, the presence of strong inter-tumor heterogeneity often obscures more subtle patterns that are shared across tumors, some of which may characterize clinically relevant disease subtypes. Here we introduce a new statistical method, generalized binary covariance decomposition (GBCD), to address this problem.
View Article and Find Full Text PDFWe introduce mvSuSiE, a multi-trait fine-mapping method for identifying putative causal variants from genetic association data (individual-level or summary data). mvSuSiE learns patterns of shared genetic effects from data, and exploits these patterns to improve power to identify causal SNPs. Comparisons on simulated data show that mvSuSiE is competitive in speed, power and precision with existing multi-trait methods, and uniformly improves on single-trait fine-mapping (SuSiE) in each trait separately.
View Article and Find Full Text PDFPredicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging effects that are shared across phenotypes.
View Article and Find Full Text PDFParts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups.
View Article and Find Full Text PDFSepsis is a systemic response to infection with life-threatening consequences. Our understanding of the impact of sepsis across organs of the body is rudimentary. Here, using mouse models of sepsis, we generate a dynamic, organism-wide map of the pathogenesis of the disease, revealing the spatiotemporal patterns of the effects of sepsis across tissues.
View Article and Find Full Text PDFIn recent work, Wang et al introduced the "Sum of Single Effects" (SuSiE) model, and showed that it provides a simple and efficient approach to fine-mapping genetic variants from individual-level data. Here we present new methods for fitting the SuSiE model to summary data, for example to single-SNP z-scores from an association study and linkage disequilibrium (LD) values estimated from a suitable reference panel. To develop these new methods, we first describe a simple, generic strategy for extending any individual-level data method to deal with summary data.
View Article and Find Full Text PDFJ R Stat Soc Series B Stat Methodol
December 2020
We introduce a simple new approach to variable selection in linear regression, with a particular focus on . The approach is based on a new model - the "Sum of Single Effects" () model - which comes from writing the sparse vector of regression coefficients as a sum of "single-effect" vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure - Iterative Bayesian Stepwise Selection (IBSS) - which is a Bayesian analogue of stepwise selection methods.
View Article and Find Full Text PDFMaking scientific analyses reproducible, well documented, and easily shareable is crucial to maximizing their impact and ensuring that others can build on them. However, accomplishing these goals is not easy, requiring careful attention to organization, workflow, and familiarity with tools that are not a regular part of every scientist's toolbox. We have developed an R package, , to help all scientists, regardless of background, overcome these challenges.
View Article and Find Full Text PDFWe introduce new statistical methods for analyzing genomic data sets that measure many effects in many conditions (for example, gene expression changes under many treatments). These new methods improve on existing methods by allowing for arbitrary correlations in effect sizes among conditions. This flexible approach increases power, improves effect estimates and allows for more quantitative assessments of effect-size heterogeneity compared to simple shared or condition-specific assessments.
View Article and Find Full Text PDFGenomic selection has been proposed as the standard method to predict breeding values in animal and plant breeding. Although some crops have benefited from this methodology, studies in Coffea are still emerging. To date, there have been no studies describing how well genomic prediction models work across populations and environments for different complex traits in coffee.
View Article and Find Full Text PDFThe genetics underlying variation in health-related musculoskeletal phenotypes can be investigated in a mouse model. Quantitative trait loci (QTLs) affecting musculoskeletal traits in the LG/J and SM/J strain lineage remain to be refined and corroborated. The aim of this study was to map muscle and bone traits in males (n = 506) of the 50th filial generation of advanced intercross lines (LG/SM AIL) derived from the two strains.
View Article and Find Full Text PDFDespite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records.
View Article and Find Full Text PDFGenome-wide association studies (GWASs) have identified numerous loci that influence risk for psychiatric diseases. Genetically engineered mice are often used to characterize genes implicated by GWASs. These studies are based on the assumption that observed genotype-phenotype relationships will generalize to humans, implying that the results would at least generalize to other inbred mouse strains.
View Article and Find Full Text PDFGenetic association mapping in structured populations of model organisms can offer a fruitful complement to human genetic studies by generating new biological hypotheses about complex traits. Here we investigated prepulse inhibition (PPI), a measure of sensorimotor gating that is disrupted in a number of psychiatric disorders. To identify genes that influence PPI, we constructed a panel of half-sibs by crossing 30 females from common inbred mouse strains with inbred C57BL/6J males to create male and female F1 offspring.
View Article and Find Full Text PDF