The dramatic increase in the complexity of flow cytometric datasets requires new computational approaches that can maximize the amount of information derived and overcome the limitations of traditional gating strategies. Herein, we present a multivariate computational analysis of the HIV-infected flow cytometry datasets that were provided as part of the FlowCAP-IV Challenge using unsupervised and supervised learning techniques. Out of 383 samples (stimulated and unstimulated), 191 samples were used as a training set (34 individuals whose disease did not progress, and 157 individuals whose disease did progress). Using the results from the training set, the participants in the Challenge were then asked to predict the condition and progression time of the remaining individuals (45 "nonprogressors" and 147 "progressors"). To achieve this, we first scaled down data resolution and then excluded doublet cells from the analysis using Expectation Maximization approaches. We then standardized all samples into histograms and used Genetic Algorithm-Neural Network to extract feature sets from the datasets, the reliability of which were examined using WEKA-implemented classifiers. The selected feature set resulted in a high sensitivity and specificity for the discrimination of progressors and nonprogressors in the training set (average True Positive Rate = 1.00 and average False Positive Rate = 0.033). The capacity of the feature set to predict real-time survival time was better when using data from the "unstimulated" training set (r = 0.825). The P-values and 95% confidence interval log-rank ratios between actual and predicted survival time in the test set were 0.682 and 0.9542 ± 0.24 for the unstimulated dataset, and 0.4451 and 0.9173 ± 0.23 for the stimulated dataset. Our analytic strategy has demonstrated a promising capacity to extract useful information from complex flow cytometry datasets, despite a significance imbalance and variation between the training and test sets.

Download full-text PDF

Source
http://dx.doi.org/10.1002/cyto.a.22622DOI Listing

Publication Analysis

Top Keywords

training set
16
flow cytometry
12
multivariate computational
8
progression time
8
cytometry datasets
8
individuals disease
8
disease progress
8
feature set
8
survival time
8
set
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!