NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.

BMC Bioinformatics

Department of Computer Science and Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Kowloon Tong, Hong Kong.

Published: September 2016

Background: RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated.

Results: In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications.

Conclusions: We have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/~xwan/NBLDA.R or https://github.com/yangchadam/NBLDA.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5022247PMC
http://dx.doi.org/10.1186/s12859-016-1208-1DOI Listing

Publication Analysis

Top Keywords

rna-seq data
36
negative binomial
32
discriminant analysis
16
analysis rna-seq
16
linear discriminant
12
data
11
rna-seq
10
binomial
8
binomial linear
8
data poisson
8

Similar Publications

A comprehensive in silico genome-wide identification and characterization of SQUAMOSA promoter binding protein (SBP) gene family in Musa acuminata.

J Genet Eng Biotechnol

March 2025

Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh; Department of Molecular Biology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh. Electronic address:

One of the largest and most significant transcription factor gene families in plants is the SQUAMOSA promoter binding protein (SBP) gene family and they perform critical regulatory roles in floral enhancement, fruit development, and stress resistance. The SBP protein family (also known as SPL) has not yet been thoroughly studied in the staple fruit crop, banana. A perennial monocot plant, banana is essential for ensuring food and nutrition security.

View Article and Find Full Text PDF

Motivation: Inferring gene networks provides insights into biological pathways and functional relationships among genes. When gene expression samples exhibit heterogeneity, they may originate from unknown subtypes, prompting the utilization of mixture Gaussian graphical model for simultaneous subclassification and gene network inference. However, this method overlooks the heterogeneity of network relationships across subtypes and does not sufficiently emphasize shared relationships.

View Article and Find Full Text PDF

Antigen-experienced memory B-cells (MBC) are endowed with enhanced functional properties compared to naïve B cells and play an important role in the humoral response. However, the epigenetic enzymes and programs that govern their rapid differentiation are incompletely understood. Here, the role of the histone H3 lysine 27 methyltransferase EZH2 in the formation of MBC in response to an influenza infection was determined in Mus musculus.

View Article and Find Full Text PDF

Expression quantitative trait loci (eQTLs) provide a key bridge between noncoding DNA sequence variants and organismal traits. The effects of eQTLs can differ among tissues, cell types, and cellular states, but these differences are obscured by gene expression measurements in bulk populations. We developed a one-pot approach to map eQTLs in by single-cell RNA sequencing (scRNA-seq) and applied it to over 100,000 single cells from three crosses.

View Article and Find Full Text PDF

Protocol for isolating the single bacteriocyte from whiteflies for single-cell RNA-seq analysis.

STAR Protoc

March 2025

Liaoning Key Laboratory of Economic and Applied Entomology, College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China; Shenyang Key Laboratory of Surveillance and Management for Vegetable Diseases and Insect Pests, College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China. Electronic address:

Bacteriocytes are specialized insect cells adapted to harbor symbionts. However, their low number in individual whiteflies makes obtaining enough for transcriptome sequencing challenging. Here, we present a protocol for the isolation of whitefly bacteriocytes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!