Publications by authors named "Sung Hou Kim"

Article Synopsis
  • The SARS-CoV-2 virus, which caused the COVID-19 pandemic, has been sequenced over 16 million times, revealing many new viral lineages and genetic variants.
  • In a retrospective study, researchers analyzed unique viral sequences and found around 44,000 variants that grouped into four main genomic categories, each with distinct genetic traits.
  • The findings suggest that certain highly-conserved variant-genotypes could help predict future viral mutations, which is significant for developing effective panvalent vaccines for COVID-19 and other viruses.
View Article and Find Full Text PDF

All current categorizations of human population, such as ethnicity, ancestry and race, are based on various selections and combinations of complex and dynamic common characteristics, that are mostly societal and cultural in nature, perceived by the members within or from outside of the categorized group. During the last decade, a massive amount of a new type of characteristics, that are exclusively genomic in nature, became available that allows us to analyze the inherited whole-genome demographics of extant human, especially in the fields such as human genetics, health sciences and medical practices (e.g.

View Article and Find Full Text PDF

An organism tree of life (organism ToL) is a conceptual and metaphorical tree to capture a simplified narrative of the evolutionary course and kinship among the extant organisms. Such a tree cannot be experimentally validated but may be reconstructed based on characteristics associated with the organisms. Since the whole-genome sequence of an organism is, at present, the most comprehensive descriptor of the organism, a whole-genome sequence-based ToL can be an empirically derivable surrogate for the organism ToL.

View Article and Find Full Text PDF

Background: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment.

Results: Here, we present a community resource (http://afproject.

View Article and Find Full Text PDF

Prevention and early intervention are the most effective ways of avoiding or minimizing psychological, physical, and financial suffering from cancer. However, such proactive action requires the ability to predict the individual's susceptibility to cancer with a measure of probability. Of the triad of cancer-causing factors (inherited genomic susceptibility, environmental factors, and lifestyle factors), the inherited genomic component may be derivable from the recent public availability of a large body of whole-genome variation data.

View Article and Find Full Text PDF

Fungi belong to one of the largest and most diverse kingdoms of living organisms. The evolutionary kinship within a fungal population has so far been inferred mostly from the gene-information-based trees ("gene trees"), constructed commonly based on the degree of differences of proteins or DNA sequences of a small number of highly conserved genes common among the population by a multiple sequence alignment (MSA) method. Since each gene evolves under different evolutionary pressure and time scale, it has been known that one gene tree for a population may differ from other gene trees for the same population depending on the subjective selection of the genes.

View Article and Find Full Text PDF

An empirical approach is presented for predicting the genomic susceptibility of an individual to the most likely one among nine traits, consisting of eight major cancer classes plus a healthy trait. We use four prediction methods by applying two supervised learning algorithms to two different descriptors of common genomic variations (the profiles of genotypes of SNPs and SNP syntaxes with low P values or low frequencies) of each individual genome from normal cells. All four methods made correct predictions substantially better than random predictions for most cancer classes, but not for some others.

View Article and Find Full Text PDF

The potential for pluripotent cells to differentiate into diverse specialized cell types has given much hope to the field of regenerative medicine. Nevertheless, the low efficiency of cell commitment has been a major bottleneck in this field. Here we provide a strategy to enhance the efficiency of early differentiation of pluripotent cells.

View Article and Find Full Text PDF

A whole-genome phylogeny of the Escherichia coli/Shigella group was constructed by using the feature frequency profile (FFP) method. This alignment-free approach uses the frequencies of l-mer features of whole genomes to infer phylogenic distances. We present two phylogenies that accentuate different aspects of E.

View Article and Find Full Text PDF

Despite the safety and feasibility of mesenchymal stem cell (MSC) therapy, an optimal cell type has not yet emerged in terms of electromechanical integration in infarcted myocardium. We found that poor to moderate survival benefits of MSC-implanted rats were caused by incomplete electromechanical integration induced by tissue heterogeneity between myocytes and engrafted MSCs in the infarcted myocardium. Here, we report the development of cardiogenic cells from rat MSCs activated by phorbol myristate acetate, a PKC activator, that exhibited high expressions of cardiac-specific markers and Ca(2+) homeostasis-related proteins and showed adrenergic receptor signaling by norepinephrine.

View Article and Find Full Text PDF

We present a whole-proteome phylogeny of prokaryotes constructed by comparing feature frequency profiles (FFPs) of whole proteomes. Features are l-mers of amino acids, and each organism is represented by a profile of frequencies of all features. The selection of feature length is critical in the FFP method, and we have developed a procedure for identifying the optimal feature lengths for inferring the phylogeny of prokaryotes, strictly speaking, a proteome phylogeny.

View Article and Find Full Text PDF

Ten complete mammalian genome sequences were compared by using the "feature frequency profile" (FFP) method of alignment-free comparison. This comparison technique reveals that the whole nongenic portion of mammalian genomes contains evolutionary information that is similar to their genic counterparts--the intron and exon regions. We partitioned the complete genomes of mammals (such as human, chimp, horse, and mouse) into their constituent nongenic, intronic, and exonic components.

View Article and Find Full Text PDF

The vast sequence divergence among different virus groups has presented a great challenge to alignment-based sequence comparison among different virus families. Using an alignment-free comparison method, we construct the whole-proteome phylogeny for a population of viruses from 11 viral families comprising 142 large dsDNA eukaryote viruses. The method is based on the feature frequency profiles (FFP), where the length of the feature (l-mer) is selected to be optimal for phylogenomic inference.

View Article and Find Full Text PDF

We have analyzed the interstitial water (ISW) structures in 1500 protein crystal structures deposited in the Protein Data Bank that have greater than 1.5 A resolution with less than 90% sequence similarity with each other. We observed varieties of polygonal water structures composed of three to eight water molecules.

View Article and Find Full Text PDF

The Brn-5 protein, highly expressed in human brain, belongs to the POU family; a class of transcription factors involved in a wide variety of biological processes ranging from programming of embryonic stem cells to cellular housekeeping. This functional diversity is conferred by two DNA-binding subdomains that can assume several configurations due to a bipartite arrangement of POU-specific (POU(S)) and POU-homeo (POU(H)) subdomains separated by a linker region. The crystal structure of human Brn-5 transcription factor in complex with corticotrophin-releasing hormone (CRH) gene promoter reveals an unexpected recognition mode of the protein to its cognate DNA.

View Article and Find Full Text PDF

Nanog and Sox2 are key transcriptional factors involved in self-renewal and pluripotency of stem cells in human and other mammals. Nanog and Sox2 contain homeodomain (HD) and high-mobility group (HMG) DNA-binding domain, respectively, for targeting them to their regulatory regions and the other regions with transactivation function by providing sites for recruiting other transcriptional regulators. To gain insights in the biochemical and biophysical characteristics of the other regions of Nanog and Sox2, we have tried to overproduce and purify full length wild-type human Nanog and Sox2 expressed in Escherichia coli.

View Article and Find Full Text PDF

The Protein Data Bank file format is the format most widely used by protein crystallographers and biologists to disseminate and manipulate protein structures. Despite this, there are few user-friendly software packages available to efficiently edit and extract raw information from PDB files. This limitation often leads to many protein crystallographers wasting significant time manually editing PDB files.

View Article and Find Full Text PDF

For comparison of whole-genome (genic + nongenic) sequences, multiple sequence alignment of a few selected genes is not appropriate. One approach is to use an alignment-free method in which feature (or l-mer) frequency profiles (FFP) of whole genomes are used for comparison-a variation of a text or book comparison method, using word frequency profiles. In this approach it is critical to identify the optimal resolution range of l-mers for the given set of genomes compared.

View Article and Find Full Text PDF

Many environmentally important photo- and chemolithoautotrophic bacteria accumulate globules of polymeric, water-insoluble sulfur as a transient product during oxidation of reduced sulfur compounds. Oxidation of this sulfur requires the concerted action of Dsr proteins. However, individual functions and interplay of these proteins are largely unclear.

View Article and Find Full Text PDF

We have obtained precatalytic (enzyme-substrate complex) and postcatalytic (enzyme-product complex) crystal structures of an active full-length hammerhead RNA that cleaves in the crystal. Using the natural satellite tobacco ringspot virus hammerhead RNA sequence, the self-cleavage reaction was modulated by substituting the general base of the ribozyme, G12, with A12, a purine variant with a much lower pKa that does not significantly perturb the ribozyme's atomic structure. The active, but slowly cleaving, ribozyme thus permitted isolation of enzyme-substrate and enzyme-product complexes without modifying the nucleophile or leaving group of the cleavage reaction, nor any other aspect of the substrate.

View Article and Find Full Text PDF

Immobilized metal ion affinity chromatography (IMAC) has become one of the most popular protein purification methods for recombinant proteins with a hexa-histidine tag (His-tag) placed at the C- or N-terminus of proteins. Nevertheless, there are always difficult proteins that show weak binding to the metal chelating resin and thus low purity. These difficulties are often overcome by increasing the His-tag to 8 or 10 histidines.

View Article and Find Full Text PDF

The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes.

View Article and Find Full Text PDF

Important cellular processes such as cell fate are likely to be controlled by an elaborate orchestration of multiple signaling pathways, many of which are still not well understood or known. Because protein kinases, the members of a large family of proteins involved in modulating many known signaling pathways, are likely to play important roles in balancing multiple signals to modulate cell fate, we focused our initial search for chemical reagents that regulate stem cell fate among known inhibitors of protein kinases. We have screened 41 characterized inhibitors of six major protein kinase subfamilies to alter the orchestration of multiple signaling pathways involved in differentiation of stem cells.

View Article and Find Full Text PDF

The Brn-5 protein plays an important role in the control of cellular development and belongs to a class of transcription factors that usually contain two domains: the POU homeodomain (POU(HD)) and the POU-specific domain (POU(S)). Since high-quality crystals suitable for crystallographic studies of the proteins of this class are difficult to obtain, all the known structural information available is for POU(HD) and/or POU(S). This paper describes several critical steps that allowed the production of high-quality crystals of the full-length Brn-5 protein complexed with its cognate DNA.

View Article and Find Full Text PDF