A common goal in chemistry is to study the relationship between a measured signal and the variability of certain factors. To this end, researchers often use Design of Experiment to decide which experiments to conduct and (Multiple) Linear Regression, and/or Analysis of Variance to analyze the collected data. Among the assumptions to the very foundation of this strategy, all the experiments are independent, conditional on the settings of the factors.
View Article and Find Full Text PDFRiver water is an important source of Dutch drinking water. For this reason, continuous monitoring of river water quality is needed. However, comprehensive chemical analyses with high-resolution gas chromatography [GC]-mass spectrometry [MS]/liquid chromatography [LC]-MS are quite tedious and time consuming; this makes them poorly fit for routine water quality monitoring and, therefore, many pollution events are missed.
View Article and Find Full Text PDFThe development of portable NIR instruments facilitates widespread use among non-specialists. However, untrained operators may follow non-optimal measurement procedures. This work investigates how different factors in the measurement procedure influence the spectra of pig feed samples produced by SCiO, a handheld NIR.
View Article and Find Full Text PDFFor the extraction of spatially important regions from mass spectrometry imaging (MSI) data, different clustering methods have been proposed. These clustering methods are based on certain assumptions and use different criteria to assign pixels into different classes. For high-dimensional MSI data, the curse of dimensionality also limits the performance of clustering methods which are usually overcome by pre-processing the data using dimension reduction techniques.
View Article and Find Full Text PDFPurpose: To evaluate the value of convolutional neural network (CNN) in the diagnosis of human brain tumor or Alzheimer's disease by MR spectroscopic imaging (MRSI) and to compare its Matthews correlation coefficient (MCC) score against that of other machine learning methods and previous evaluation of the same data. We address two challenges: 1) limited number of cases in MRSI datasets and 2) interpretability of results in the form of relevant spectral regions.
Methods: A shallow CNN with only one hidden layer and an ad-hoc loss function was constructed involving two branches for processing spectral and image features of a brain voxel respectively.
Many industries see a shifting focus towards performing on-site analysis using handheld spectroscopic devices. A determining factor for decision-making on the commissioning of these devices is available information on the potential performance of the device for specific applications. By now, myriad handheld solutions with very different specifications and pricing are available on the market.
View Article and Find Full Text PDFThe long-term prediction performance of spectroscopic calibration models is a critical factor to monitor or control many production processes. Over time, new variations may emerge that deteriorate prediction performance. Therefore, models have to be maintained to retain or improve their prediction performance through time, requiring considerable resources and data.
View Article and Find Full Text PDFThe rapid evolution of the flow cytometry field, currently allowing the measurement of 30-50 parameters per cell, has led to a marked increase in deep multivariate information. Manual gating is insufficient to extract all this information. Therefore, multivariate analysis (MVA) methods have been developed to extract information and efficiently analyze the high-density multicolour flow cytometry (MFC) data.
View Article and Find Full Text PDFBackground: Drug mass spectrometry imaging (MSI) data contain knowledge about drug and several other molecular ions present in a biological sample. However, a proper approach to fully explore the potential of such type of data is still missing. Therefore, a computational pipeline that combines different spatial and non-spatial methods is proposed to link the observed drug distribution profile with tumor heterogeneity in solid tumor.
View Article and Find Full Text PDFFlow Cytometry is an analytical technology to simultaneously measure multiple markers per single cell. Ten thousands to millions of single cells can be measured per sample and each sample may contain a different number of cells. All samples may be bundled together, leading to a 'multi-set' structure.
View Article and Find Full Text PDFDiffuse reflectance near-infrared (NIR) data (908-1676 nm) of chicken breast fillets was recorded in a non-destructive way using a portable miniaturised NIR spectrometer. The NIR data was used to discriminate between fresh and thawed breast fillets and to determine the birds' growth conditions. NIR data was recorded of 153 commercial supermarket chicken fillet samples by applying the NIR device equipped with the standard issue collar on the samples in three different ways: (i) directly on the meat (ii) through the top foil of the package (i.
View Article and Find Full Text PDFCombining the individual analytical strengths of mass spectrometry and infrared spectroscopy, infrared ion spectroscopy is increasingly recognized as a powerful tool for small-molecule identification in a wide range of analytical applications. Mass spectrometry is itself a leading analytical technique for small-molecule identification on the merit of its outstanding sensitivity, selectivity and versatility. The foremost shortcoming of the technique, however, is its limited ability to directly probe molecular structure, especially when contrasted against spectroscopic techniques.
View Article and Find Full Text PDFMulticolour flow cytometry (MFC) is used to measure multiple cellular markers at the single-cell level. Cellular markers may be coloured with different panels of fluorescently-labelled antibodies to enable cell identification or the detection of activated cells in pre-defined, 'gated' specific cell subsets. The number of markers that can be used per measurement is technologically limited however, requiring every panel to be analysed in a separate aliquot measurement.
View Article and Find Full Text PDFMulticolor Flow Cytometry (MFC)-based gating allows the selection of cellular (pheno)types based on their unique marker expression. Current manual gating practice is highly subjective and may remove relevant information to preclude discovery of cell populations with specific co-expression of multiple markers. Only multivariate approaches can extract such aspects of cell variability from multi-dimensional MFC data.
View Article and Find Full Text PDFThe calibration performance of Partial Least Squares regression (PLS) can be improved by eliminating uninformative variables. For PLS, many variable elimination methods have been developed. One is the Uninformative-Variable Elimination for PLS (UVE-PLS).
View Article and Find Full Text PDFMulticolour Flow Cytometry (MFC) produces multidimensional analytical data on the quantitative expression of multiple markers on single cells. This data contains invaluable biomedical information on (1) the marker expressions per cell, (2) the variation in such expression across cells, (3) the variability of cell marker expression across samples that (4) may vary systematically between cells collected from donors and patients. Current conventional and even advanced data analysis methods for MFC data explore only a subset of these levels.
View Article and Find Full Text PDFRevealing the biochemistry associated to micro-organismal interspecies interactions is highly relevant for many purposes. Each pathogen has a characteristic metabolic fingerprint that allows identification based on their unique multivariate biochemistry. When pathogen species come into mutual contact, their co-culture will display a chemistry that may be attributed both to mixing of the characteristic chemistries of the mono-cultures and to competition between the pathogens.
View Article and Find Full Text PDFIn this work we show that convolutional neural networks (CNNs) can be efficiently used to classify vibrational spectroscopic data and identify important spectral regions. CNNs are the current state-of-the-art in image classification and speech recognition and can learn interpretable representations of the data. These characteristics make CNNs a good candidate for reducing the need for preprocessing and for highlighting important spectral regions, both of which are crucial steps in the analysis of vibrational spectroscopic data.
View Article and Find Full Text PDFThe aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames.
View Article and Find Full Text PDFHistorically, advances in the field of ion mobility spectrometry have been hindered by the variation in measured signals between instruments developed by different research laboratories or manufacturers. This has triggered the development and application of chemometric techniques able to reveal and analyze precious information content of ion mobility spectra. Recent advances in multidimensional coupling of ion mobility spectrometry to chromatography and mass spectrometry has created new, unique challenges for data processing, yielding high-dimensional, megavariate datasets.
View Article and Find Full Text PDFBackground: Genomic prediction (GP) allows breeders to select plants and animals based on their breeding potential for desirable traits, without lengthy and expensive field trials or progeny testing. We have proposed to use Dissimilarity-based Partial Least Squares (DPLS) for GP. As a case study, we use the DPLS approach to predict Bacterial wilt (BW) in tomatoes using SNPs as predictors.
View Article and Find Full Text PDFCurrent challenges of clinical breath analysis include large data size and non-clinically relevant variations observed in exhaled breath measurements, which should be urgently addressed with competent scientific data tools. In this study, three different baseline correction methods are evaluated within a previously developed data size reduction strategy for multi capillary column - ion mobility spectrometry (MCC-IMS) datasets. Introduced for the first time in breath data analysis, the Top-hat method is presented as the optimum baseline correction method.
View Article and Find Full Text PDFThe selection of optimal preprocessing is among the main bottlenecks in chemometric data analysis. Preprocessing currently is a burden, since a multitude of different preprocessing methods is available for, e.g.
View Article and Find Full Text PDFThe present investigation uses proton transfer reaction mass spectrometry (PTR-MS) combined with multivariate and univariate statistical analyses to study potential biomarkers for altered metabolism in urine due to strenuous walking. Urine samples, in concurrence with breath and blood samples, were taken from 51 participants (23 controls, 11 type-1 diabetes, 17 type-2 diabetes) during the Dutch endurance walking event, the . Multivariate analysis allowed for discrimination of before and after exercise for all three groups (control, type-1 and type-2 diabetes) and on three out of 4 days.
View Article and Find Full Text PDF