The degree of dissimilarity between genome sequences of homologous species is a measure of the evolutionary distance between them. It serves as a metric in the construction of phylogenetic trees, which depict the evolutionary relationships and common ancestry among different species. Given two genome sequences, evolutionary distance is determined by estimating the number of global mutations that transform one sequence to the other.
View Article and Find Full Text PDFGlobular proteins typically fold into tightly packed arrays of regular secondary structures. We developed a model to approximate the compact parallel and antiparallel arrangement of α-helices and β-strands, enumerated all possible topologies formed by up to five secondary structural elements (SSEs), searched for their occurrence in spatial structures of proteins, and documented their frequencies of occurrence in the PDB. The enumeration model grows larger super-secondary structure patterns (SSPs) by combining pairs of smaller patterns, a process that approximates a potential path of protein fold evolution.
View Article and Find Full Text PDFRecent technological advances in genomics now allow producing biological data at unprecedented tera- and petabyte scales. Yet, the extraction of useful knowledge from this voluminous data presents a significant challenge to a scientific community. Efficient mining of vast and complex data sets for the needs of biomedical research critically depends on seamless integration of clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships accumulated in a plethora of publicly available databases.
View Article and Find Full Text PDFBackground: Numerous types of clustering like single linkage and K-means have been widely studied and applied to a variety of scientific problems. However, the existing methods are not readily applicable for the problems that demand high stringency.
Methods: Our method, self consistency grouping, i.
A k-bounded (k ≥ 2) transposition is an operation that switches two elements that have at most k - 2 elements in between. We study the problem of sorting a circular permutation π of length n for k = 2, i.e.
View Article and Find Full Text PDFAssessing structural similarity and defining common regions through comparison of protein spatial structures is an important task in functional and evolutionary studies of proteins. There are many servers that compare structures and define sub-structures in common between proteins through superposition and closeness of either coordinates or contacts. However, a natural way to analyze a structure for experts working on structure classification is to look for specific three-dimensional (3D) motifs and patterns instead of finding common features in two proteins.
View Article and Find Full Text PDF