Unlabelled: We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel.
Availability And Implementation: gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm
Contact: mghandi@gmail.com or mbeer@jhu.edu
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4937197 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btw203 | DOI Listing |
Bioinformatics
April 2021
Department of Data Science and Big Data Technology, College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, 430070 Wuhan, China.
Motivation: Both the lack or limitation of experimental data of transcription factor binding sites (TFBS) in plants and the independent evolutions of plant TFs make computational approaches for identifying plant TFBSs lagging behind the relevant human researches. Observing that TFs are highly conserved among plant species, here we first employ the deep convolutional neural network (DeepCNN) to build 265 Arabidopsis TFBS prediction models based on available DAP-seq (DNA affinity purification sequencing) datasets, and then transfer them into homologous TFs in other plants.
Results: DeepCNN not only achieves greater successes on Arabidopsis TFBS predictions when compared with gkm-SVM and MEME but also has learned its known motif for most Arabidopsis TFs as well as cooperative TF motifs with protein-protein interaction evidences as its biological interpretability.
Bioinformatics
December 2020
Department of Computer Science, University of Virginia, Charlottesville, VA, USA.
Motivation: Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task's alphabet size.
Results: In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels.
Bioinformatics
July 2016
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!