Publications by authors named "Jacqueline M Hughes-Oliver"

In virtual screening for drug discovery, hit enrichment curves are widely used to assess the performance of ranking algorithms with regard to their ability to identify early enrichment. Unfortunately, researchers almost never consider the uncertainty associated with estimating such curves before declaring differences between performance of competing algorithms. Uncertainty is often large because the testing fractions of interest to researchers are small.

View Article and Find Full Text PDF

The goal of chemmodlab is to streamline the fitting and assessment pipeline for many machine learning models in R, making it easy for researchers to compare the utility of these models. While focused on implementing methods for model fitting and assessment that have been accepted by experts in the cheminformatics field, all of the methods in chemmodlab have broad utility for the machine learning community. chemmodlab contains several assessment utilities, including a plotting function that constructs accumulation curves and a function that computes many performance measures.

View Article and Find Full Text PDF

Permeation of chemical solutes through skin can create major health issues. Using the membrane-coated fiber (MCF) as a solid phase membrane extraction (SPME) approach to simulate skin permeation, we obtained partition coefficients for 37 solutes under 90 treatment combinations that could broadly represent formulations that could be associated with occupational skin exposure. These formulations were designed to mimic fluids in the metalworking process, and they are defined in this manuscript using: one of mineral oil, polyethylene glycol-200, soluble oil, synthetic oil, or semi-synthetic oil; at a concentration of 0.

View Article and Find Full Text PDF

A new classification method called the Optimal Bit String Tree is proposed to identify quantitative structure-activity relationships (QSARs). The method introduces the concept of a to describe the presence/absence context of a combination of descriptors. A descriptor set and its optimal chromosome form the splitting variable.

View Article and Find Full Text PDF

ChemModLab, written by the ECCR @ NCSU consortium under NIH support, is a toolbox for fitting and assessing quantitative structure-activity relationships (QSARs). Its elements are: a cheminformatic front end used to supply molecular descriptors for use in modeling; a set of methods for fitting models; and methods for validating the resulting model. Compounds may be input as structures from which standard descriptors will be calculated using the freely available cheminformatic front end PowerMV; PowerMV also supports compound visualization.

View Article and Find Full Text PDF

Ensemble methods have become popular for QSAR modeling, but most studies have assumed balanced data, consisting of approximately equal numbers of active and inactive compounds. Cheminformatics data are often far from being balanced. We extend the application of ensemble methods to include cases of imbalance of class membership and to more adequately assess model output.

View Article and Find Full Text PDF

We suggest a parametric modeling approach for nonstationary spatial processes driven by point sources. Baseline near-stationarity, which may be reasonable in the absence of a point source, is modeled using a conditional autoregressive (CAR) Markov random field. Variability due to the point source is captured by our proposed autoregressive point source (ARPS) model.

View Article and Find Full Text PDF

Motivation: New biological systems technologies give scientists the ability to measure thousands of bio-molecules including genes, proteins, lipids and metabolites. We use domain knowledge, e.g.

View Article and Find Full Text PDF

Drug discovery is dependent on finding a very small number of biologically active or potent compounds among millions of compounds stored in chemical collections. Quantitative structure-activity relationships suggest that potency of a compound is highly related to that compound's chemical makeup or structure. To improve the efficiency of cell-based analysis methods for high throughput screening, where information of a compound's structure is used to predict potency, we consider a number of potentially influential factors in the cell-based approach.

View Article and Find Full Text PDF