Background: RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated.
Results: In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications.
Conclusions: We have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/~xwan/NBLDA.R or https://github.com/yangchadam/NBLDA.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5022247 | PMC |
http://dx.doi.org/10.1186/s12859-016-1208-1 | DOI Listing |
J Genet Eng Biotechnol
March 2025
Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh; Department of Molecular Biology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh. Electronic address:
One of the largest and most significant transcription factor gene families in plants is the SQUAMOSA promoter binding protein (SBP) gene family and they perform critical regulatory roles in floral enhancement, fruit development, and stress resistance. The SBP protein family (also known as SPL) has not yet been thoroughly studied in the staple fruit crop, banana. A perennial monocot plant, banana is essential for ensuring food and nutrition security.
View Article and Find Full Text PDFBioinformatics
March 2025
Department of Statistics, Hunan University, Changsha, 410000, China.
Motivation: Inferring gene networks provides insights into biological pathways and functional relationships among genes. When gene expression samples exhibit heterogeneity, they may originate from unknown subtypes, prompting the utilization of mixture Gaussian graphical model for simultaneous subclassification and gene network inference. However, this method overlooks the heterogeneity of network relationships across subtypes and does not sufficiently emphasize shared relationships.
View Article and Find Full Text PDFJ Immunol
March 2025
Department of Microbiology and Immunology, Emory University School of Medicine, Atlanta, GA, United States.
Antigen-experienced memory B-cells (MBC) are endowed with enhanced functional properties compared to naïve B cells and play an important role in the humoral response. However, the epigenetic enzymes and programs that govern their rapid differentiation are incompletely understood. Here, the role of the histone H3 lysine 27 methyltransferase EZH2 in the formation of MBC in response to an influenza infection was determined in Mus musculus.
View Article and Find Full Text PDFElife
March 2025
Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States.
Expression quantitative trait loci (eQTLs) provide a key bridge between noncoding DNA sequence variants and organismal traits. The effects of eQTLs can differ among tissues, cell types, and cellular states, but these differences are obscured by gene expression measurements in bulk populations. We developed a one-pot approach to map eQTLs in by single-cell RNA sequencing (scRNA-seq) and applied it to over 100,000 single cells from three crosses.
View Article and Find Full Text PDFSTAR Protoc
March 2025
Liaoning Key Laboratory of Economic and Applied Entomology, College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China; Shenyang Key Laboratory of Surveillance and Management for Vegetable Diseases and Insect Pests, College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China. Electronic address:
Bacteriocytes are specialized insect cells adapted to harbor symbionts. However, their low number in individual whiteflies makes obtaining enough for transcriptome sequencing challenging. Here, we present a protocol for the isolation of whitefly bacteriocytes.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!