Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.gene.2019.04.047 | DOI Listing |
Nat Photonics
October 2024
Institut national de la recherche scientifique, Centre Énergie Matériaux Télécommunications, Varennes, Quebec Canada.
Quantum walks on photonic platforms represent a physics-rich framework for quantum measurements, simulations and universal computing. Dynamic reconfigurability of photonic circuitry is key to controlling the walk and retrieving its full operation potential. Universal quantum processing schemes based on time-bin encoding in gated fibre loops have been proposed but not demonstrated yet, mainly due to gate inefficiencies.
View Article and Find Full Text PDFBMC Genomics
January 2025
Unit of Mycoplasmas, Laboratory of Molecular Microbiology, Vaccinology and Biotechnology Development, Institut Pasteur de Tunis, University Tunis El Manar, Tunis, Tunisia.
Background: Avian mycoplasmas are small bacteria associated with several pathogenic conditions in many wild and poultry bird species. Extensive genomic data are available for many avian mycoplasmas, yet no comparative studies focusing on this group of mycoplasmas have been undertaken so far.
Results: Here, based on the comparison of forty avian mycoplasma genomes belonging to ten different species, we provide insightful information on the phylogeny, pan/core genome, energetic metabolism, and virulence of these avian pathogens.
PLoS One
January 2025
Department of Computer Science and Engineering at Hanyang University ERICA, Ansan-si, Gyeonggi-do, South Korea.
Privacy-preserving record linkage (PPRL) technology, crucial for linking records across datasets while maintaining privacy, is susceptible to graph-based re-identification attacks. These attacks compromise privacy and pose significant risks, such as identity theft and financial fraud. This study proposes a zero-relationship encoding scheme that minimizes the linkage between source and encoded records to enhance PPRL systems' resistance to re-identification attacks.
View Article and Find Full Text PDFACS Appl Mater Interfaces
January 2025
Institute of Optoelectronic Technology, Fuzhou University, Fuzhou 350116, China.
Anticounterfeiting technologies meet challenges in the Internet of Things era due to the rapidly growing volume of objects, their frequent connection with humans, and the accelerated advance of counterfeiting/cracking techniques. Here, we, inspired by biological fingerprints, present a simple anticounterfeiting system based on perovskite quantum dot (PQD) fingerprint physical unclonable function (FPUF) by cooperatively utilizing the spontaneous-phase separation of polymers and selective in situ synthesis PQDs as an entropy source. The FPUFs offer red, green, and blue full-color fingerprint identifiers and random three-dimensional (3D) morphology, which extends binary to multivalued encoding by tuning the perovskite and polymer components, enabling a high encoding capacity (about 10, far surpassing that of biometric fingerprints).
View Article and Find Full Text PDFEntropy (Basel)
December 2024
Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China.
Image segmentation is a crucial task in artificial intelligence fields such as computer vision and medical imaging. While convolutional neural networks (CNNs) have achieved notable success by learning representative features from large datasets, they often lack geometric priors and global object information, limiting their accuracy in complex scenarios. Variational methods like active contours provide geometric priors and theoretical interpretability but require manual initialization and are sensitive to hyper-parameters.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!