The purpose of this article is to propose an empirical solution to the problem of how many clusters of complex samples should be selected to construct the training set for a universal near infrared quantitative model based on the Naes method. The sample spectra were hierarchically classified into clusters by Ward's algorithm and Euclidean distance. If the sample spectra were classified into two clusters, the 1/50 of the largest Heterogeneity value in the cluster with larger variation was set as the threshold to determine the total number of clusters. One sample was then randomly selected from each cluster to construct the training set, and the number of samples in training set equaled the number of clusters. In this study, 98 batches of rifampicin capsules with API contents ranging from 50.1% to 99.4% were studied with this strategy. The root mean square errors of cross validation and prediction were 2.54% and 2.31% for the model for rifampicin capsules, respectively. Then, we evaluated this model in terms of outlier diagnostics, accuracy, precision, and robustness. We also used the strategy of training set sample selection to revalidate the models for cefradine capsules, roxithromycin tablets, and erythromycin ethylsuccinate tablets, and the results were satisfactory. In conclusion, all results showed that this training set sample selection strategy assisted in the quick and accurate construction of quantitative models using near-infrared spectroscopy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3134668PMC
http://dx.doi.org/10.1208/s12249-011-9638-6DOI Listing

Publication Analysis

Top Keywords

training set
24
selection strategy
8
quantitative model
8
construct training
8
sample spectra
8
classified clusters
8
number clusters
8
rifampicin capsules
8
set sample
8
sample selection
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!