This paper discusses the properties of proteins and their relations in the interactomes of the selected subsets of SARS-CoV-2 proteome-the membrane protein, nonstructural proteins, and, finally, full proteome. Protein disorder according to several measures, liquid-liquid phase separation probabilities, and protein node degrees in the interaction networks were singled out as the features of interest. Additionally, viral interactomes were combined with the interactome of human lung tissue so as to examine if the new connections in the resulting viral-host interactome are linked to protein disorder.
View Article and Find Full Text PDFBackground: In the last decade and a half it has been firmly established that a large number of proteins do not adopt a well-defined (ordered) structure under physiological conditions. Such intrinsically disordered proteins (IDPs) and intrinsically disordered (protein) regions (IDRs) are involved in essential cell processes through two basic mechanisms: the entropic chain mechanism which is responsible for rapid fluctuations among many alternative conformations, and molecular recognition via short recognition elements that bind to other molecules. IDPs possess a high adaptive potential and there is special interest in investigating their involvement in organism evolution.
View Article and Find Full Text PDFDNA repeats have great importance for biological research and a large number of tools for determining repeats have been developed. Herein we define a method for extracting a statistically significant subset of a determined set of repeats. Our aim was to identify a subset of repeats in the input sequences that are not expected to occur with a number of their appearances in a random sequence of the same length.
View Article and Find Full Text PDFTo associate phenotypic characteristics of an organism to molecules encoded by its genome, there is a need for well-structured genotype and phenotype data. We use a novel method for extracting data on phenotype and genotype characteristics of microorganisms from text. As a resource, we use an encyclopedia of microorganisms, which holds phenotypic and genotypic data and create a structured, flexible data resource, which can be exported to a range of database formats, containing genotype and phenotype data for 2412 species and 873 genera of microbes.
View Article and Find Full Text PDFBackground: A significant number of proteins have been shown to be intrinsically disordered, meaning that they lack a fixed 3 D structure or contain regions that do not posses a well defined 3 D structure. It has also been proven that a protein's disorder content is related to its function. We have performed an exhaustive analysis and comparison of the disorder content of proteins from prokaryotic organisms (i.
View Article and Find Full Text PDFUsing the data from Protein Data Bank the correlations of primary and secondary structures of proteins were analyzed. The correlation values of the amino acids and the eight secondary structure types were calculated, where the position of the amino acid and the position in sequence with the particular secondary structure differ at most 25. The diagrams describing these results indicate that correlations are significant at distances between -9 and 10.
View Article and Find Full Text PDFComput Methods Programs Biomed
March 2009
The paper presents a novel, n-gram-based method for analysis of bacterial genome segments known as genomic islands (GIs). Identification of GIs in bacterial genomes is an important task since many of them represent inserts that may contribute to bacterial evolution and pathogenesis. In order to characterize and distinguish GIs from rest of the genome, binary classification of islands based on n-gram frequency distribution have been performed.
View Article and Find Full Text PDFThe correlation between the primary and secondary structures of proteins was analysed using a large data set from the Protein Data Bank. Clear preferences of amino acids towards certain secondary structures classify amino acids into four groups: alpha-helix preferrers, strand preferrers, turn and bend preferrers, and His and Cys (the latter two amino acids show no clear preference for any secondary structure). Amino acids in the same group have similar structural characteristics at their Cbeta and Cgamma atoms that predicts their preference for a particular secondary structure.
View Article and Find Full Text PDFThere are two approaches to identifying genomic and pathogenesis islands (GI/PAIs) in bacterial genomes: the compositional and the functional, based on DNA or protein level composition and gene function, respectively. We applied n-gram analysis in addition to other compositional features, combined them by union and intersection and defined two measures for evaluating the results-recall and precision. Using the best criteria (by training on the Escherichia coli O157:H7 EDL933 genome), we predicted GIs for 14 Enterobacteriaceae family members and for 21 randomly selected bacterial genomes.
View Article and Find Full Text PDFGenomics Proteomics Bioinformatics
February 2005
A dataset of 103 SARS-CoV isolates (101 human patients and 2 palm civets) was investigated on different aspects of genome polymorphism and isolate classification. The number and the distribution of single nucleotide variations (SNVs) and insertions and deletions, with respect to a "profile", were determined and discussed ("profile" being a sequence containing the most represented letter per position). Distribution of substitution categories per codon positions, as well as synonymous and non-synonymous substitutions in coding regions of annotated isolates, was determined, along with amino acid (a.
View Article and Find Full Text PDFBackground: We have compared 38 isolates of the SARS-CoV complete genome. The main goal was twofold: first, to analyze and compare nucleotide sequences and to identify positions of single nucleotide polymorphism (SNP), insertions and deletions, and second, to group them according to sequence similarity, eventually pointing to phylogeny of SARS-CoV isolates. The comparison is based on genome polymorphism such as insertions or deletions and the number and positions of SNPs.
View Article and Find Full Text PDF