BMC Med Inform Decis Mak
February 2024
Background: Unsupervised clustering and outlier detection are important in medical research to understand the distributional composition of a collective of patients. A number of clustering methods exist, also for high-dimensional data after dimension reduction. Clustering and outlier detection may, however, become less robust or contradictory if multiple high-dimensional data sets per patient exist.
View Article and Find Full Text PDFBackground: The ever-increasing availability of high-density genomic markers in the form of single nucleotide polymorphisms (SNPs) enables genomic prediction, i.e. the inference of phenotypes based solely on genomic data, in the field of animal and plant breeding, where it has become an important tool.
View Article and Find Full Text PDFIntroduction: Naturally attenuated Langat virus (LGTV) and highly pathogenic tick-borne encephalitis virus (TBEV) share antigenically similar viral proteins and are grouped together in the same flavivirus serocomplex. In the early 1970s, this has encouraged the usage of LGTV as a potential live attenuated vaccine against tick-borne encephalitis (TBE) until cases of encephalitis were reported among vaccinees. Previously, we have shown in a mouse model that immunity induced against LGTV protects mice against lethal TBEV challenge infection.
View Article and Find Full Text PDFOutliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage.
View Article and Find Full Text PDFTo better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-learning models with genes as predictor variables. Early diagnosis of patients by molecular signatures could also contribute to better treatments.
View Article and Find Full Text PDFEstimating the taxonomic composition of viral sequences in a biological samples processed by next-generation sequencing is an important step in comparative metagenomics. Mapping sequencing reads against a database of known viral reference genomes, however, fails to classify reads from novel viruses whose reference sequences are not yet available in public databases. Instead of a mapping approach, and in order to classify sequencing reads at least to a taxonomic level, the performance of artificial neural networks and other machine learning models was studied.
View Article and Find Full Text PDF