Publications by authors named "Kejue Jia"

The JCVI-Syn3 organism is a minimal organism derived from Mycoplasma mycoides capri, which is capable of self-replication. While the ancestor has 863 genes, the synthetic progeny has only 473, with 434 of these coding for proteins. Despite initial efforts to understand all functions of the organism, a significant number of these protein-coding genes still have unknown functions, and subsequent studies have been only partially successful in elucidating their roles.

View Article and Find Full Text PDF

Siphonophores (Cnidaria: Hydrozoa) are abundant predators found throughout the ocean and are important constituents of the global zooplankton community. They range in length from a few centimeters to tens of meters. They are gelatinous, fragile, and difficult to collect, so many aspects of the biology of these roughly 200 species remain poorly understood.

View Article and Find Full Text PDF

Motivation: Presenting the integrated results of bioinformatics research can be challenging and requires sophisticated visualization components, which can be time-consuming to develop. This article presents a new way to effectively communicate research findings.

Results: We have developed a static web page generator, JSONWP, which is specifically designed for protein bioinformatics research.

View Article and Find Full Text PDF

Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate.

View Article and Find Full Text PDF

Cadherin intermolecular interactions are critical for cell-cell adhesion and play essential roles in tissue formation and the maintenance of tissue structures. In this study, we focus on E-cadherin, a classical cadherin that connects epithelial cells, to understand how they interact in cis and trans conformations when attached to the same cell or opposing cells. We employ coevolutionary sequence analysis and molecular dynamics simulations to confirm previously known interaction sites as well as to identify new interaction sites.

View Article and Find Full Text PDF

The sequence correlations within a protein multiple sequence alignment are routinely being used to predict contacts within its structure, but here we point out that these data can also be used to predict a protein's dynamics directly. The elastic network protein dynamics models rely directly upon the contacts, and the normal modes of motion are obtained from the decomposition of the inverse of the contact map. To make the direct connection between sequence and dynamics, it is necessary to apply coarse-graining to the structure at the level of one point per amino acid, which has often been done, and protein coarse-grained dynamics from elastic network models has been highly successful, particularly in representing the large-scale motions of proteins that usually relate closely to their functions.

View Article and Find Full Text PDF

There are several hundred million protein sequences, but the relationships among them are not fully available from existing homolog detection methods. There is an essential need for an improved method to push homolog detection to lower levels of sequence identity. The method used here relies on a language model to represent proteins numerically in a matrix (an embedding) and uses discrete cosine transforms to compress the data to extract the most essential part, significantly reducing the data size.

View Article and Find Full Text PDF

Measuring the dependence of ≥ 3 random variables and drawing inference from such higher-order dependences are scientifically important yet challenging. Motivated here by protein coevolution with multivariate categorical features, we consider an information theoretic measure of higher-order dependence. The proposed collective dependence is a symmetrization of differential interaction information which generalizes the mutual information of a pair of random variables.

View Article and Find Full Text PDF

Two new computational approaches are described to aid in the design of new peptide-based drugs by evaluating ensembles of protein structures from their dynamics and through the assessing of structures using empirical contact potential. These approaches build on the concept that conformational variability can aid in the binding process and, for disordered proteins, can even facilitate the binding of more diverse ligands. This latter consideration indicates that such a design process should be less restrictive so that multiple inhibitors might be effective.

View Article and Find Full Text PDF

Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions representing exchanges between two interacting amino acids, such as a small-large pair changing to a large-small pair substitutions that are not individually so conservative.

View Article and Find Full Text PDF

Protein functional mechanisms usually require conformational changes, and often there are known structures for the different conformational states. However, usually neither the origin of the driving force nor the underlying pathways for these conformational transitions is known. Exothermic chemical reactions may be an important source of forces that drive conformational changes.

View Article and Find Full Text PDF

Evaluating protein structures requires reliable free energies with good estimates of both potential energies and entropies. Although there are many demonstrated successes from using knowledge-based potential energies, computing entropies of proteins has lagged far behind. Here we take an entirely different approach and evaluate knowledge-based conformational entropies of proteins based on the observed frequencies of contact changes between amino acids in a set of 167 diverse proteins, each of which has two alternative structures.

View Article and Find Full Text PDF

The essential aspects of the ribosome's mechanism can be extracted from coarse-grained simulations, including the ratchet motion, the movement together of critical bases at the decoding center, and movements of the peptide tunnel lining that assist in the expulsion of the synthesized peptide. Because of its large size, coarse graining helps to simplify and to aid in the understanding of its mechanism. Results presented here utilize coarse-grained elastic network modeling to extract the dynamics, and both RNAs and proteins are coarse grained.

View Article and Find Full Text PDF

The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein.

View Article and Find Full Text PDF