Publications by authors named "Boris Jankovic"

Identification of genomic signals as indicators for functional genomic elements is one of the areas that received early and widespread application of machine learning methods. With time, the methods applied grew in variety and generally exhibited a tendency to improve their ability to identify some major genomic and transcriptomics signals. The evolution of machine learning in genomics followed a similar path to applications of machine learning in other fields.

View Article and Find Full Text PDF

Background: The accurate identification of the exon/intron boundaries is critical for the correct annotation of genes with multiple exons. Donor and acceptor splice sites (SS) demarcate these boundaries. Therefore, deriving accurate computational models to predict the SS are useful for functional annotation of genes and genomes, and for finding alternative SS associated with different diseases.

View Article and Find Full Text PDF

Background: The accurate identification of the exon/intron boundaries is critical for the correct annotation of genes with multiple exons. Donor and acceptor splice sites (SS) demarcate these boundaries. Therefore, deriving accurate computational models to predict the SS are useful for functional annotation of genes and genomes, and for finding alternative SS associated with different diseases.

View Article and Find Full Text PDF

Polyadenylation signals (PAS) are found in most protein-coding and some non-coding genes in eukaryotes. Their accurate recognition improves understanding gene regulation mechanisms and recognition of the 3'-end of transcribed gene regions where premature or alternate transcription ends may lead to various diseases. Although different methods and tools for in-silico prediction of genomic signals have been proposed, the correct identification of PAS in genomic DNA remains challenging due to a vast number of non-relevant hexamers identical to PAS hexamers.

View Article and Find Full Text PDF

Motivation: Recognition of different genomic signals and regions (GSRs) in DNA is crucial for understanding genome organization, gene regulation, and gene function, which in turn generate better genome and gene annotations. Although many methods have been developed to recognize GSRs, their pure computational identification remains challenging. Moreover, various GSRs usually require a specialized set of features for developing robust recognition models.

View Article and Find Full Text PDF

Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved.

View Article and Find Full Text PDF

Background: Finding a source from which high-energy-density biofuels can be derived at an industrial scale has become an urgent challenge for renewable energy production. Some microorganisms can produce free fatty acids (FFA) as precursors towards such high-energy-density biofuels. In particular, photosynthetic cyanobacteria are capable of directly converting carbon dioxide into FFA.

View Article and Find Full Text PDF

Honey bee colonies exhibit an age-related division of labor, with worker bees performing discrete sets of behaviors throughout their lifespan. These behavioral states are associated with distinct brain transcriptomic states, yet little is known about the regulatory mechanisms governing them. We used CAGEscan (a variant of the Cap Analysis of Gene Expression technique) for the first time to characterize the promoter regions of differentially expressed brain genes during two behavioral states (brood care (aka "nursing") and foraging) and identified transcription factors (TFs) that may govern their expression.

View Article and Find Full Text PDF

The study of proteomes provides new insights into stimulus-specific responses of protein synthesis and turnover, and the role of post-translational modifications at the systems level. Due to the diverse chemical nature of proteins and shortcomings in the analytical techniques used in their study, only a partial display of the proteome is achieved in any study, and this holds particularly true for plant proteomes. Here we show that different solubilization and separation methods have profound effects on the resulting proteome.

View Article and Find Full Text PDF

Metagenomics-based functional profiling analysis is an effective means of gaining deeper insight into the composition of marine microbial populations and developing a better understanding of the interplay between the functional genome content of microbial communities and abiotic factors. Here we present a comprehensive analysis of 24 datasets covering surface and depth-related environments at 11 sites around the world's oceans. The complete datasets comprises approximately 12 million sequences, totaling 5,358 Mb.

View Article and Find Full Text PDF

Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles.

View Article and Find Full Text PDF

Background: Initiation of transcription is essential for most of the cellular responses to environmental conditions and for cell and tissue specificity. This process is regulated through numerous proteins, their ligands and mutual interactions, as well as interactions with DNA. The key such regulatory proteins are transcription factors (TFs) and transcription co-factors (TcoFs).

View Article and Find Full Text PDF

Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA. Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent.

View Article and Find Full Text PDF

Background: Increasing structural and biochemical evidence suggests that post-translational methionine oxidation of proteins is not just a result of cellular damage but may provide the cell with information on the cellular oxidative status. In addition, oxidation of methionine residues in key regulatory proteins, such as calmodulin, does influence cellular homeostasis. Previous findings also indicate that oxidation of methionine residues in signaling molecules may have a role in stress responses since these specific structural modifications can in turn change biological activities of proteins.

View Article and Find Full Text PDF

Summary: In higher eukaryotes, the identification of translation initiation sites (TISs) has been focused on finding these signals in cDNA or mRNA sequences. Using Arabidopsis thaliana (A.t.

View Article and Find Full Text PDF

Mutations in any genome may lead to phenotype characteristics that determine ability of an individual to cope with adaptation to environmental challenges. In studies of human biology, among the most interesting ones are phenotype characteristics that determine responses to drug treatments, response to infections, or predisposition to specific inherited diseases. Most of the research in this field has been focused on the studies of mutation effects on the final gene products, peptides, and their alterations.

View Article and Find Full Text PDF

Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms.

View Article and Find Full Text PDF

Background: Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation.

View Article and Find Full Text PDF