gkmSVM: an R package for gapped-kmer SVM.

Bioinformatics

McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.

Published: July 2016

AI Article Synopsis

  • A new R package for training gapped-kmer SVM classifiers has been developed, featuring a faster algorithm for kernel matrix calculations that improves run time by 2 to 5 times compared to the previous version.
  • The package supports various sequence kernels, including gkmSVM, kmer-SVM, mismatch kernel, and wildcard kernel.
  • It is freely available on CRAN for multiple operating systems, with further resources accessible at www.beerlab.org/gkmsvm and additional data available online.

Article Abstract

Unlabelled: We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel.

Availability And Implementation: gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm

Contact: mghandi@gmail.com or mbeer@jhu.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4937197PMC
http://dx.doi.org/10.1093/bioinformatics/btw203DOI Listing

Publication Analysis

Top Keywords

gkmsvm package
8
gapped-kmer svm
8
gkmsvm
4
package gapped-kmer
4
svm unlabelled
4
unlabelled package
4
package training
4
training gapped-kmer
4
svm classifiers
4
classifiers dna
4

Similar Publications

TSPTFBS: a Docker image for trans-species prediction of transcription factor binding sites in plants.

Bioinformatics

April 2021

Department of Data Science and Big Data Technology, College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, 430070 Wuhan, China.

Motivation: Both the lack or limitation of experimental data of transcription factor binding sites (TFBS) in plants and the independent evolutions of plant TFs make computational approaches for identifying plant TFBSs lagging behind the relevant human researches. Observing that TFs are highly conserved among plant species, here we first employ the deep convolutional neural network (DeepCNN) to build 265 Arabidopsis TFBS prediction models based on available DAP-seq (DNA affinity purification sequencing) datasets, and then transfer them into homologous TFs in other plants.

Results: DeepCNN not only achieves greater successes on Arabidopsis TFBS predictions when compared with gkm-SVM and MEME but also has learned its known motif for most Arabidopsis TFs as well as cooperative TF motifs with protein-protein interaction evidences as its biological interpretability.

View Article and Find Full Text PDF

Motivation: Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task's alphabet size.

Results: In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels.

View Article and Find Full Text PDF

gkmSVM: an R package for gapped-kmer SVM.

Bioinformatics

July 2016

McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.

Article Synopsis
  • A new R package for training gapped-kmer SVM classifiers has been developed, featuring a faster algorithm for kernel matrix calculations that improves run time by 2 to 5 times compared to the previous version.
  • The package supports various sequence kernels, including gkmSVM, kmer-SVM, mismatch kernel, and wildcard kernel.
  • It is freely available on CRAN for multiple operating systems, with further resources accessible at www.beerlab.org/gkmsvm and additional data available online.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!