Background: The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire genome from which genomic signals are detected (e.g. copy number changes in DNA-seq, enrichment peaks in ChIP-seq). For accurate analysis of read-count data, many state-of-the-art statistical methods use generalized linear models (GLM) coupled with the negative-binomial (NB) distribution by leveraging its ability for simultaneous bias correction and signal detection. However, although statistically powerful, the GLM+NB method has a quadratic computational complexity and therefore suffers from slow running time when applied to large-scale windowed read-count data. In this study, we aimed to speed up substantially the GLM+NB method by using a randomized algorithm and we demonstrate here the utility of our approach in the application of detecting copy number variants (CNVs) using a real example.
Results: We propose an efficient estimator, the randomized GLM+NB coefficients estimator (RGE), for speeding up the GLM+NB method. RGE samples the read-count data and solves the estimation problem on a smaller scale. We first theoretically validated the consistency and the variance properties of RGE. We then applied RGE to GENSENG, a GLM+NB based method for detecting CNVs. We named the resulting method as "R-GENSENG". Based on extensive evaluation using both simulated and empirical data, we concluded that R-GENSENG is ten times faster than the original GENSENG while maintaining GENSENG's accuracy in CNV detection.
Conclusions: Our results suggest that RGE strategy developed here could be applied to other GLM+NB based read-count analyses, i.e. ChIP-seq data analysis, to substantially improve their computational efficiency while preserving the analytic power.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5831535 | PMC |
http://dx.doi.org/10.1186/s12859-018-2077-6 | DOI Listing |
Nucleic Acids Res
January 2025
Bioinformatics Division, WEHI, Parkville, VIC 3052, Australia.
edgeR is an R/Bioconductor software package for differential analyses of sequencing data in the form of read counts for genes or genomic features. Over the past 15 years, edgeR has been a popular choice for statistical analysis of data from sequencing technologies such as RNA-seq or ChIP-seq. edgeR pioneered the use of the negative binomial distribution to model read count data with replicates and the use of generalized linear models to analyze complex experimental designs.
View Article and Find Full Text PDFSci Data
January 2025
Department of Molecular Science and Technology, Ajou University, Suwon, 16499, Republic of Korea.
Chinese hamster ovary (CHO) cells play a pivotal role in the production of recombinant therapeutics. In the present study, we conducted a genome-scale pooled CRISPR knockout (KO) screening using a virus-free, recombinase-mediated cassette exchange-based platform in CHO-K1 host and CHO-K1 derived recombinant cells. Genome-wide guide RNA (gRNA) amplicon sequencing data were generated from cell libraries, as well as short- and long-term KO libraries, and validated through phenotypic assessment and gRNA read count distribution.
View Article and Find Full Text PDFJ Clin Microbiol
December 2024
Department of Veterinary Microbiology, University of Saskatchewan, Saskatoon, Saskatchewan, Canada.
Bovine reproductive failure, which includes infertility, abortion, and stillbirth in cattle, leads to significant economic losses for beef and milk producers. Diagnosing the infectious causes of bovine reproductive failure is challenging as there are multiple pathogens associated with it. The traditional stepwise approach to diagnostic testing is time-consuming and can cause significant delays.
View Article and Find Full Text PDFInfect Agent Cancer
November 2024
Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 99, Aarhus N, 8200, Denmark.
Nucleic Acids Res
January 2025
Centre for Computational Biology and Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore.
Single-cell RNA sequencing (scRNA-seq) has emerged as the key technique for studying transcriptomics at the single-cell level. In our previous work, we presented the DISCO database (https://www.immunesinglecell.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!