Publications by authors named "Hokeun Sun"

Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region.

View Article and Find Full Text PDF

Background: Identification of pleiotropic variants associated with multiple phenotypic traits has received increasing attention in genetic association studies. Overlapping genetic associations from multiple traits help to detect weak genetic associations missed by single-trait analyses. Many statistical methods were developed to identify pleiotropic variants with most of them being limited to quantitative traits when pleiotropic effects on both quantitative and qualitative traits have been observed.

View Article and Find Full Text PDF
Article Synopsis
  • This study evaluated the genetic diversity of 384 peanut germplasms, including Korean and USDA collections, using a 58K SNP array to analyze traits related to seed aspect ratio.
  • Researchers identified 14,030 polymorphic SNPs and found five SNPs significantly associated with seed aspect ratio, particularly focusing on chromosome Araip.B08.
  • The study highlights the potential role of the phosphoenolpyruvate carboxylase (PEPC) gene in influencing seed aspect ratio and provides valuable insights for future genetic research and breeding programs in peanuts.
View Article and Find Full Text PDF
Article Synopsis
  • * Researchers genotyped 384 cowpea accessions from 21 countries using a large SNP dataset, which revealed four major populations based on geographic origins, including distinct groups from Korea and West Africa.
  • * The analysis showed low genetic diversity and high inbreeding among the Korean cowpea accessions, providing insights that could aid in enhancing cowpea breeding programs in Korea.
View Article and Find Full Text PDF

Background: Prognostic genes or gene signatures have been widely used to predict patient survival and aid in making decisions pertaining to therapeutic actions. Although some web-based survival analysis tools have been developed, they have several limitations.

Objective: Taking these limitations into account, we developed ESurv (Easy, Effective, and Excellent Survival analysis tool), a web-based tool that can perform advanced survival analyses using user-derived data or data from The Cancer Genome Atlas (TCGA).

View Article and Find Full Text PDF

Gene set analysis aims to identify differentially expressed or co-expressed genes within a biological pathway between two experimental conditions, so that it can eventually reveal biological processes and pathways involved in disease development. In the last few decades, various statistical and computational methods have been proposed to improve statistical power of gene set analysis. In recent years, much attention has been paid to differentially co-expressed genes since they can be potentially disease-related genes without significant difference in average expression levels between two conditions.

View Article and Find Full Text PDF

M13 bacteriophage-based colorimetric sensors, especially multi-array sensors, have been successfully demonstrated to be a powerful platform for detecting extremely small amounts of target molecules. Colorimetric sensors can be fabricated easily using self-assembly of genetically engineered M13 bacteriophage which incorporates peptide libraries on its surface. However, the ability to discriminate many types of target molecules is still required.

View Article and Find Full Text PDF

Background: Pheochromocytoma and paraganglioma (PPGL) are tumours that arise from chromaffin cells. Some genetic mutations influence PPGL, among which, those in genes encoding subunits of succinate dehydrogenase (SDHA, SDHB, SDHC and SDHD) and assembly factor (SDHAF2) are the most relevant. However, the risk of metastasis posed by these mutations is not reported except for SDHB and SDHD mutations.

View Article and Find Full Text PDF

Background: In human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network structures in terms of true positive selection. In recent epigenetic research on case-control association studies, relatively many statistical methods have been proposed to identify cancer-related CpG sites and their corresponding genes from high-dimensional DNA methylation array data. However, most of existing methods are not designed to utilize genetic network information although methylation levels between linked genes in the genetic networks tend to be highly correlated with each other.

View Article and Find Full Text PDF

A large body of evidence suggests that B-cell lymphomas with enhanced Myc expression are associated with an aggressive phenotype and poor prognosis, which makes Myc a compelling therapeutic target. Phosphodiesterase 4B (PDE4B), a main hydrolyzer of cyclic AMP (cAMP) in B cells, was shown to be involved in cell survival and drug resistance in diffuse large B cell lymphomas (DLBCL). However, the interrelationship between Myc and PDE4B remains unclear.

View Article and Find Full Text PDF

Free fatty acids (FFAs), which are elevated with metabolic syndrome, are considered the principal offender exerting lipotoxicity. Few previous studies have reported a causal relationship between FFAs and osteoarthritis pathogenesis. However, the molecular mechanism by which FFAs exert lipotoxicity and induce osteoarthritis remains largely unknown.

View Article and Find Full Text PDF
Article Synopsis
  • Regularization methods are important for analyzing complex genomic data, especially with DNA methylation data from the Infinium HumanMethylation450 BeadChip, which contains multiple CpG sites for each gene.
  • The paper highlights two main regularization techniques: Sparse Group Lasso (SGL) for scenarios where most CpG sites in a gene are related to an outcome, and network-based regularization when only a few are relevant.
  • A new variable selection strategy is proposed that tracks selection frequency of variables from both methods, showing better performance in simulations and in identifying significant CpG sites linked to ovarian cancer.
View Article and Find Full Text PDF

Hyper-activation of PAK1 (p21-activated kinase 1) is frequently observed in human cancer and speculated as a target of novel anti-tumor drug. In previous, we also showed that PAK1 is highly activated in the Smad4-deficient condition and suppresses PUMA (p53 upregulated modulator of apoptosis) through direct binding and phosphorylation. On the basis of this result, we have tried to find novel PAK1-PUMA binding inhibitors.

View Article and Find Full Text PDF

In human genome research, genetic association studies of rare variants have been widely studied since the advent of high-throughput DNA sequencing platforms. However, detection of outcome-related rare variants still remains a statistically challenging problem because the number of observed genetic mutations is extremely rare. Recently, a power set-based statistical selection procedure has been proposed to locate both risk and protective rare variants within the outcome-related genes or genetic regions.

View Article and Find Full Text PDF

Motivation: DNA methylation plays an important role in many biological processes and cancer progression. Recent studies have found that there are also differences in methylation variations in different groups other than differences in methylation means. Several methods have been developed that consider both mean and variance signals in order to improve statistical power of detecting differentially methylated loci.

View Article and Find Full Text PDF
Article Synopsis
  • The text discusses the challenges of identifying genes related to diseases using high-dimensional genomic data, highlighting issues like multiple testing errors and the need for effective group testing procedures.
  • It compares traditional statistical group testing methods (like PCA and Hotelling's T test) with a regularization approach known as group lasso, which uses penalized likelihood for regression analysis on genomic markers.
  • The study found significant discrepancies in the genes identified as associated with ovarian cancer between the traditional group testing methods and the group lasso approach, suggesting that different methods yield different results in gene selection.
View Article and Find Full Text PDF

Stress has been suggested as one of important cause of human cancer without molecular biological evidence. Thus, we test the effect of stress-related hormones on cell viability and mitotic fidelity. Similarly to estrogen, stress hormone cortisol and its relative cortisone increase microtubule organizing center (MTOC) number through elevated expression of γ-tubulin and provide the Taxol resistance to human cancer cell lines.

View Article and Find Full Text PDF
Article Synopsis
  • Genetic association studies face challenges in pinpointing rare variants linked to complex diseases due to the small number of mutations observed and the presence of both risk and protective variants in the same genetic region.
  • Current statistical methods struggle to distinguish between causal and noncausal rare variants, despite their effectiveness in detecting phenotypic associations among groups of rare variants.
  • This article introduces a new statistical selection strategy that efficiently identifies causal rare variants by linearly combining potential risk and protective variants, demonstrating superior performance in simulations and real data from the ANGPTL gene family in the Dallas Heart Study.
View Article and Find Full Text PDF

We consider estimation and variable selection in high-dimensional Cox regression when a prior knowledge of the relationships among the covariates, described by a network or graph, is available. A limitation of the existing methodology for survival analysis with high-dimensional genomic data is that a wealth of structural information about many biological processes, such as regulatory networks and pathways, has often been ignored. In order to incorporate such prior network information into the analysis of genomic data, we propose a network-based regularization method for high-dimensional Cox regression; it uses an ℓ-penalty to induce sparsity of the regression coefficients and a quadratic Laplacian penalty to encourage smoothness between the coefficients of neighboring variables on a given network.

View Article and Find Full Text PDF

Motivation: Existing association methods for rare variants from sequencing data have focused on aggregating variants in a gene or a genetic region because of the fact that analysing individual rare variants is underpowered. However, these existing rare variant detection methods are not able to identify which rare variants in a gene or a genetic region of all variants are associated with the complex diseases or traits. Once phenotypic associations of a gene or a genetic region are identified, the natural next step in the association study with sequencing data is to locate the susceptible rare variants within the gene or the genetic region.

View Article and Find Full Text PDF

The matched case-control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case-control studies with high-dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network-based penalty.

View Article and Find Full Text PDF

Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes.

View Article and Find Full Text PDF

Motivation: DNA methylation is a molecular modification of DNA that plays crucial roles in regulation of gene expression. Particularly, CpG rich regions are frequently hypermethylated in cancer tissues, but not methylated in normal tissues. However, there are not many methodological literatures of case-control association studies for high-dimensional DNA methylation data, compared with those of microarray gene expression.

View Article and Find Full Text PDF

Many different biological processes are represented by network graphs such as regulatory networks, metabolic pathways, and protein-protein interaction networks. Since genes that are linked on the networks usually have biologically similar functions, the linked genes form molecular modules to affect the clinical phenotypes/outcomes. Similarly, in large-scale genetic association studies, many SNPs are in high linkage disequilibrium (LD), which can also be summarized as a LD graph.

View Article and Find Full Text PDF