Publications by authors named "Polikar R"

Motivation: This study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge.

View Article and Find Full Text PDF

A major challenge for clustering algorithms is to balance the trade-off between homogeneity, , the degree to which an individual cluster includes only related sequences, and completeness, the degree to which related sequences are broken up into multiple clusters. Most algorithms are conservative in grouping sequences with other sequences. Remote homologs may fail to be clustered together and instead form unnecessarily distinct clusters.

View Article and Find Full Text PDF

Objective: To determine how well machine learning algorithms can classify mild cognitive impairment (MCI) subtypes and Alzheimer's disease (AD) using features obtained from the digital Clock Drawing Test (dCDT).

Methods: dCDT protocols were administered to 163 patients diagnosed with AD(n = 59), amnestic MCI (aMCI; n = 26), combined mixed/dysexecutive MCI (mixed/dys MCI; n = 43), and patients without MCI (non-MCI; n = 35) using standard clock drawing command and copy procedures, that is, draw the face of the clock, put in all of the numbers, and set the hands for "10 after 11." A digital pen and custom software recorded patient's drawings.

View Article and Find Full Text PDF

Feature subset selection can be used to sieve through large volumes of data and discover the most informative subset of variables for a particular learning problem. Yet, due to memory and other resource constraints (e.g.

View Article and Find Full Text PDF

Increasingly, many machine learning applications are now associated with very large data sets whose sizes were almost unimaginable just a short time ago. As a result, many of the current algorithms cannot handle, or do not scale to, today's extremely large volumes of data. Fortunately, not all features that make up a typical data set carry information that is relevant or useful for prediction, and identifying and removing such irrelevant features can significantly reduce the total data size.

View Article and Find Full Text PDF

Introduction: The dynamic range of cerebrospinal fluid (CSF) amyloid β (Aβ) measurement does not parallel to cognitive changes in Alzheimer's disease (AD) and cognitively normal (CN) subjects across different studies. Therefore, identifying novel proteins to characterize symptomatic AD samples is important.

Methods: Proteins were profiled using a multianalyte platform by Rules Based Medicine (MAP-RBM).

View Article and Find Full Text PDF

Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set.

View Article and Find Full Text PDF

Introduction: Reductions of cerebrospinal fluid (CSF) amyloid-beta (Aβ42) and elevated phosphorylated-tau (p-Tau) reflect in vivo Alzheimer's disease (AD) pathology and show utility in predicting conversion from mild cognitive impairment (MCI) to dementia. We investigated the P50 event-related potential component as a noninvasive biomarker of AD pathology in non-demented elderly.

Methods: 36 MCI patients were stratified into amyloid positive (MCI-AD, n=17) and negative (MCI-Other, n=19) groups using CSF levels of Aβ42.

View Article and Find Full Text PDF

Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant.

View Article and Find Full Text PDF

An increasing number of real-world applications are associated with streaming data drawn from drifting and nonstationary distributions that change over time. These applications demand new algorithms that can learn and adapt to such changes, also known as concept drift. Proper characterization of such data with existing approaches typically requires substantial amount of labeled instances, which may be difficult, expensive, or even impractical to obtain.

View Article and Find Full Text PDF

The objective of this research was to assess the utility of a simple near infrared spectroscopy (NIRS) technology for objective assessment of the hemodynamic response to acute pain. For this exploration, we used functional near infrared spectroscopy (fNIRS) to measure the hemodynamic response on the forehead during three trials of a cold pressor test (CPT) in 20 adults. To measure hemodynamic changes at the superficial tissues as well as the intracranial tissues, two configurations of 'far' and 'near' source-detector separations were used.

View Article and Find Full Text PDF

Due to the enormity of the solution space for sequential ordering problems, non-exhaustive heuristic techniques have been the focus of many research efforts, particularly in the field of operations research. In this paper, we outline an ecologically motivated problem in which environmental samples have been obtained along a gradient (e.g.

View Article and Find Full Text PDF

As life expectancy increases, particularly in the developed world, so does the prevalence of Alzheimer's Disease (AD). AD is a neurodegenerative disorder characterized by neurofibrillary plaques and tangles in the brain that leads to neuronal death and dementia. Early diagnosis of AD is still a major unresolved health concern: several biomarkers are being investigated, among which the electroencephalogram (EEG) provides the only option for an electrophysiological information.

View Article and Find Full Text PDF

We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn(++). NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift.

View Article and Find Full Text PDF

Analysis of DNA sequences isolated directly from the environment, known as metagenomics, produces a large quantity of genome fragments that need to be classified into specific taxa. Most composition-based classification methods use all features instead of a subset of features that may maximize classifier accuracy. We show that feature selection methods can boost performance of taxonomic classifiers.

View Article and Find Full Text PDF

High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments ("reads") from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted.

View Article and Find Full Text PDF

Alarmingly increasing prevalence of Alzheimer's disease (AD) due to the aging population in developing countries, combined with lack of standardized and conclusive diagnostic procedures, make early diagnosis of Alzheimer's disease a major public health concern. While no current medical treatment exists to stop or reverse this disease, recent dementia specific pharmacological advances can slow its progression, making early diagnosis all the more important. Several noninvasive biomarkers have been proposed, including P300 based EEG analysis, MRI volumetric analysis, PET based metabolic activity analysis, as alternatives to neuropsychological evaluation, the current gold standard of diagnosis.

View Article and Find Full Text PDF

Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data.

View Article and Find Full Text PDF

A significant proportion of patients with heart failure happen to have a normal ventricular ejection fraction at echocardiography during examination. Previously called diastolic heart failure, it is nowadays referred to as heart failure with normal ejection fraction (HFNEF) or HF with preserved ejection fraction. The European Society of Cardiology, recognizing the importance of this type of heart failure, recently issued new definition criteria for it.

View Article and Find Full Text PDF

As the average life expectancy increases, particularly in developing countries, prevalence of neurodegenerative diseases has also increased. This trend is especially alarming for Alzheimer's disease (AD); as there is no cure to stop or reverse the effects of AD. However, recent pharmacological advances can slow the progression of AD, but only if AD is diagnosed at early stages.

View Article and Find Full Text PDF

The prevalence of Alzheimer's disease (AD) is rising alarmingly as the average age of our population increases. There is no treatment to halt or slow the pathology responsible for AD, however, new drugs are promising to reduce the rate of progression. On the other hand, the efficacy of these new medications critically depends on our ability to diagnose AD at the earliest stage.

View Article and Find Full Text PDF

We have previously introduced an incremental learning algorithm Learn(++), which learns novel information from consecutive data sets by generating an ensemble of classifiers with each data set, and combining them by weighted majority voting. However, Learn(++) suffers from an inherent "outvoting" problem when asked to learn a new class omega(new) introduced by a subsequent data set, as earlier classifiers not trained on this class are guaranteed to misclassify omega(new) instances. The collective votes of earlier classifiers, for an inevitably incorrect decision, then outweigh the votes of the new classifiers' correct decision on omega(new) instances--until there are enough new classifiers to counteract the unfair outvoting.

View Article and Find Full Text PDF

Objective: To determine whether automated classifiers can be used for correctly identifying target categorization responses from averaged event-related potentials (ERPs) along with identifying appropriate features and classification models for computer-assisted investigation of attentional processes.

Methods: ERPs were recorded during a target categorization task. Automated classification of average target ERPs versus average non-target ERPs was performed by extracting different combinations of features from the P300 and N200 components, which were used to train six classifiers: Euclidean classifier (EC), Mahalanobis discriminant (MD), quadratic classifier (QC), Fisher linear discriminant (FLD), multi-layer perceptron neural network (MLP) and support vector machine (SVM).

View Article and Find Full Text PDF

It has been widely accepted that the classification accuracy can be improved by combining outputs of multiple classifiers. However, how to combine multiple classifiers with various (potentially conflicting) decisions is still an open problem. A rich collection of classifier combination procedures -- many of which are heuristic in nature -- have been developed for this goal.

View Article and Find Full Text PDF