We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, current practice primarily involves testing the effect of one SNP at a time, commonly termed as 'single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect associated genetic variants, especially the ones with weak effects.
View Article and Find Full Text PDFHuman life has been at the edge of catastrophe for millennia due diseases which emerge and reemerge at random. The recent outbreak of the Zika virus (ZIKV) is one such menace that shook the global public health community abruptly. Modern technologies, including computational tools as well as experimental approaches, need to be harnessed fast and effectively in a coordinated manner in order to properly address such challenges.
View Article and Find Full Text PDFIn this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors.
View Article and Find Full Text PDFThe recent Zika virus (ZIKV) epidemic in the Americas ranks among the largest outbreaks in modern times. Like other mosquito-borne flaviviruses, ZIKV circulates in sylvatic cycles among primates that can serve as reservoirs of spillover infection to humans. Identifying sylvatic reservoirs is critical to mitigating spillover risk, but relevant surveillance and biological data remain limited for this and most other zoonoses.
View Article and Find Full Text PDFCurr Comput Aided Drug Des
January 2019
Background: Proper validation is an important aspect of QSAR modelling. External validation is one of the widely used validation methods in QSAR where the model is built on a subset of the data and validated on the rest of the samples. However, its effectiveness for datasets with a small number of samples but a large number of predictors remains suspect.
View Article and Find Full Text PDFBackground: Computed mathematical descriptors of molecules are used for the prediction of their property/ bioactivity. In the 1970s only a few descriptors could be calculated, currently available software can calculate a large number of descriptors for molecules or biomolecules like DNA/ RNA, proteins.
Objective: When p molecular descriptors are calculated for n molecules, the data set can be viewed as n vectors in p dimensions, each chemical being represented as a point in .
Curr Comput Aided Drug Des
April 2016
Variation in high-dimensional data is often caused by a few latent factors, and hence dimension reduction or variable selection techniques are often useful in gathering useful information from the data. In this paper we consider two such recent methods: Interrelated two-way clustering and envelope models. We couple these methods with traditional statistical procedures like ridge regression and linear discriminant analysis, and apply them on two data sets which have more predictors than samples (i.
View Article and Find Full Text PDFInterrelated Two-way Clustering (ITC) is an unsupervised clustering method developed to divide samples into two groups in gene expression data obtained through microarrays, selecting important genes simultaneously in the process. This has been found to be a better approach than conventional clustering methods like K-means or selforganizing map for the scenarios when number of samples is much smaller than number of variables (n«p). In this paper we used the ITC approach for classification of a diverse set of 508 chemicals regarding mutagenicity.
View Article and Find Full Text PDF