Publications by authors named "Eugene V Korotkov"

The exact identification of promoter sequences remains a serious problem in computational biology, as the promoter prediction algorithms under development continue to produce false-positive results. Therefore, to fully assess the validity of predicted sequences, it is necessary to perform a comprehensive test of their properties, such as the presence of downstream transcribed DNA regions behind them, or chromatin accessibility for transcription factor binding. In this paper, we examined the promoter sequences of chromosome 1 of the rice genome from the Database of Potential Promoter Sequences predicted using a mathematical algorithm based on the derivation and calculation of statistically significant promoter classes.

View Article and Find Full Text PDF

The aim of this work was to compare the multiple alignment methods MAHDS, T-Coffee, MUSCLE, Clustal Omega, Kalign, MAFFT, and PRANK in their ability to align highly divergent amino acid sequences. To accomplish this, we created test amino acid sequences with an average number of substitutions per amino acid (x) from 0.6 to 5.

View Article and Find Full Text PDF

Currently, there is a lack of bioinformatics approaches to identify highly divergent tandem repeats (TRs) in eukaryotic genomes. Here, we developed a new mathematical method to search for TRs, which uses a novel algorithm for constructing multiple alignments based on the generation of random position weight matrices (RPWMs), and applied it to detect TRs of 2 to 50 nucleotides long in the rice genome. The RPWM method could find highly divergent TRs in the presence of insertions or deletions.

View Article and Find Full Text PDF
Article Synopsis
  • Transposable elements (TEs), specifically Short Interspersed Nuclear Elements (SINEs), play a major role in eukaryotic genomes and are challenging to identify due to rapid mutations after insertion.
  • The Highly Divergent Repeat Search Method (HDRSM) outperformed the RepeatMasker program in identifying and accurately determining the boundaries of highly divergent SINE copies in the rice genome, revealing 14,030 hits – with 5,704 missed by RepeatMasker.
  • To achieve a complete understanding of SINE distribution, using both HDRSM and RepeatMasker is advised, as HDRSM excels in detecting divergent copies while RepeatMasker is more effective for shorter, more similar copies.
View Article and Find Full Text PDF

In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.

View Article and Find Full Text PDF

. We analyzed several prokaryotic and eukaryotic genomes looking for the periodicity sequences availability and employing a new mathematical method. The method envisaged using the random position weight matrices and dynamic programming.

View Article and Find Full Text PDF

Over the last years a great number of bacterial genomes were sequenced. Now one of the most important challenges of computational genomics is the functional annotation of nucleic acid sequences. In this study we presented the computational method and the annotation system for predicting biological functions using phylogenetic profiles.

View Article and Find Full Text PDF

It is known that nucleotide sequences are not totally homogeneous and this heterogeneity could not be due to random fluctuations only. Such heterogeneity poses a problem of making sequence segmentation into a set of homogeneous parts divided by the points called "change points". In this work we investigated a special case of change points-paired change points (PCP).

View Article and Find Full Text PDF

Triplet periodicity (TP) is a distinctive feature of the protein coding sequences of both prokaryotic and eukaryotic genomes. In this work, we explored the TP difference inside and between 45 prokaryotic genomes. We constructed two hypotheses of TP distribution on a set of coding sequences and generated artificial datasets that correspond to the hypotheses.

View Article and Find Full Text PDF

To determine the periodicity of a DNA sequence, different spectral approaches are applied (discrete Fourier transform (DFT), autocorrelation (CORR), information decomposition (ID), hybrid method (HYB), concept of spectral envelope for spectral analysis (SE), normalized autocorrelation (CORR_N) and profile analysis (PA). In this work, we investigated the possibility of finding the true period length, by depending on the average number of accumulated changes in DNA bases (PM) for the methods stated above. The results show that for periods with short length (≤4 b.

View Article and Find Full Text PDF

The concept of the phase shift of triplet periodicity (TP) was used for searching potential DNA insertions in genes from 17 bacterial genomes. A mathematical algorithm for detection of these insertions has been developed. This approach can detect potential insertions and deletions with lengths that are not multiples of three bases, especially insertions of relatively large DNA fragments (>100 bases).

View Article and Find Full Text PDF

The triplet periodicity (TP) is a distinguished property of protein coding sequences. There are complex genes with more than one TP type along their sequence. We say that these genes contain a triplet periodicity change point.

View Article and Find Full Text PDF

The definition of a phase shift of triplet periodicity (TP) is introduced. The mathematical algorithm for detection of TP phase shift of nucleotide sequences has been developed. Gene sequences from Kegg-46 data bank were analyzed with a purpose of searching genes with a phase shift of TP.

View Article and Find Full Text PDF

Latent amino acid repeats seem to be widespread in genetic sequences and to reflect their structure, function, and evolution. We have recently identified latent periodicity in more than 150 protein families including protein kinases and various nucleotide-binding proteins. The latent repeats in these families were correlated to their structure and evolution.

View Article and Find Full Text PDF

Here, we have applied information decomposition, cyclic profile alignment, and noise decomposition techniques to search for latent repeats within protein families of various functions. We have identified 94 protein families with a family-specific periodicity. In each case, the periodic element was found in greater than 70% of family members.

View Article and Find Full Text PDF

We identified latent periodicity in catalytic domains of approximately 85% of annotated serine-threonine and tyrosine protein kinases. Similar results were obtained for other 22 protein families and domains. We also designed the method of noise decomposition, which is aimed to distinguish between different periodicity types of the same period length.

View Article and Find Full Text PDF

Transfer RNA (tRNA)-like sequences were searched for in the nine basic taxonomic divisions of GenBank-121 (viruses, phages, bacteria, plants, invertebrates, vertebrates, rodents, mammals, and primates) by an original program package implementing a dynamic profile alignment approach for the genetic texts' analysis, in using 22 profiles of tRNAs of different isotypes. In total, 175,901 previously unknown tRNA-like sequences were revealed. The locations of the tRNA-likes were considered over the regions whose functional meaning is described by standard Feature Keys in GenBank.

View Article and Find Full Text PDF