Publications by authors named "Sangsoo Lim"

Motivation: Lead identification is a fundamental step to prioritize candidate compounds for downstream drug discovery process. Machine learning (ML) and deep learning (DL) approaches are widely used to identify lead compounds using both chemical property and experimental information. However, ML or DL methods rarely consider compound similarity information directly since ML and DL models use abstract representation of molecules for model construction.

View Article and Find Full Text PDF

Computational drug repurposing aims to identify new indications for existing drugs by utilizing high-throughput data, often in the form of biomedical knowledge graphs. However, learning on biomedical knowledge graphs can be challenging due to the dominance of genes and a small number of drug and disease entities, resulting in less effective representations. To overcome this challenge, we propose a "semantic multi-layer guilt-by-association" approach that leverages the principle of guilt-by-association - "similar genes share similar functions", at the drug-gene-disease level.

View Article and Find Full Text PDF

Drug-induced liver injury (DILI) is the main cause of drug failure in clinical trials. The characterization of toxic compounds in terms of chemical structure is important because compounds can be metabolized to toxic substances in the liver. Traditional machine learning approaches have had limited success in predicting DILI, and emerging deep graph neural network (GNN) models are yet powerful enough to predict DILI.

View Article and Find Full Text PDF

A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts.

View Article and Find Full Text PDF

Emerging evidence indicates that the accretion of senescent cells is linked to metabolic disorders. However, the underlying mechanisms and metabolic consequences of cellular senescence in obesity remain obscure. In this study, we found that obese adipocytes are senescence-susceptible cells accompanied with genome instability.

View Article and Find Full Text PDF

Cervical lymph node metastasis is the leading cause of poor prognosis in oral tongue squamous cell carcinoma and also occurs in the early stages. The current clinical diagnosis depends on a physical examination that is not enough to determine whether micrometastasis remains. The transcriptome profiling technique has shown great potential for predicting micrometastasis by capturing the dynamic activation state of genes.

View Article and Find Full Text PDF

Multi-omics data is frequently measured to enrich the comprehension of biological mechanisms underlying certain phenotypes. However, due to the complex relations and high dimension of multi-omics data, it is difficult to associate omics features to certain biological traits of interest. For example, the clinically valuable breast cancer subtypes are well-defined at the molecular level, but are poorly classified using gene expression data.

View Article and Find Full Text PDF

Reactive oxygen species (ROS) are associated with various roles of brown adipocytes. Glucose-6-phosphate dehydrogenase (G6PD) controls cellular redox potentials by producing NADPH. Although G6PD upregulates cellular ROS levels in white adipocytes, the roles of G6PD in brown adipocytes remain elusive.

View Article and Find Full Text PDF

Motivation: Multi-omics data in molecular biology has accumulated rapidly over the years. Such data contains valuable information for research in medicine and drug discovery. Unfortunately, data-driven research in medicine and drug discovery is challenging for a majority of small research labs due to the large volume of data and the complexity of analysis pipeline.

View Article and Find Full Text PDF

Gene expression profile or transcriptome can represent cellular states, thus understanding gene regulation mechanisms can help understand how cells respond to external stress. Interaction between transcription factor (TF) and target gene (TG) is one of the representative regulatory mechanisms in cells. In this paper, we present a novel computational method to construct condition-specific transcriptional networks from transcriptome data.

View Article and Find Full Text PDF

There has recently been a rapid progress in computational methods for determining protein targets of small molecule drugs, which will be termed as compound protein interaction (CPI). In this review, we comprehensively review topics related to computational prediction of CPI. Data for CPI has been accumulated and curated significantly both in quantity and quality.

View Article and Find Full Text PDF

White adipose tissue (WAT) is a key regulator of systemic energy metabolism, and impaired WAT plasticity characterized by enlargement of preexisting adipocytes associates with WAT dysfunction, obesity, and metabolic complications. However, the mechanisms that retain proper adipose tissue plasticity required for metabolic fitness are unclear. Here, we comprehensively showed that adipocyte-specific DNA methylation, manifested in enhancers and CTCF sites, directs distal enhancer-mediated transcriptomic features required to conserve metabolic functions of white adipocytes.

View Article and Find Full Text PDF

Pharmacogenomics is the study of how genes affect a person's response to drugs. Thus, understanding the effect of drug at the molecular level can be helpful in both drug discovery and personalized medicine. Over the years, transcriptome data upon drug treatment has been collected and several databases compiled before drug treatment cancer cell multi-omics data with drug sensitivity ( , AUC) or time-series transcriptomic data after drug treatment.

View Article and Find Full Text PDF

Motivation: Biological pathway is an important curated knowledge of biological processes. Thus, cancer subtype classification based on pathways will be very useful to understand differences in biological mechanisms among cancer subtypes. However, pathways include only a fraction of the entire gene set, only one-third of human genes in KEGG, and pathways are fragmented.

View Article and Find Full Text PDF

Motivation: Intratumor heterogeneity (ITH) represents the diversity of cell populations that make up cancer tissue. The level of ITH in a tumor is usually measured by a genomic variation profile, such as copy number variation and somatic mutation. However, a recent study has identified ITH at the transcriptome level and suggested that ITH at gene expression levels is useful for predicting prognosis.

View Article and Find Full Text PDF

Enhancer is a DNA sequence of a genome that controls transcription of downstream target genes. Enhancers are known to be associated with certain epigenetic signatures. Machine learning tools, such as CSI-ANN, ChromHMM, and RFECS, were developed for predicting enhancers using various epigenetic features.

View Article and Find Full Text PDF

Motivation: Biological pathways are extensively used for the analysis of transcriptome data to characterize biological mechanisms underlying various phenotypes. There are a number of computational tools that summarize transcriptome data at the pathway level. However, there is no comparative study on how well these tools produce useful information at the cohort level, enabling comparison of many samples or patients.

View Article and Find Full Text PDF

Background: Identifying perturbed pathways in a given condition is crucial in understanding biological phenomena. In addition to identifying perturbed pathways individually, pathway analysis should consider interactions among pathways. Currently available pathway interaction prediction methods are based on the existence of overlapping genes between pathways, protein-protein interaction (PPI) or functional similarities.

View Article and Find Full Text PDF

Intratumor heterogeneity (ITH) is observed at different stages of tumor progression, metastasis and reouccurence, which can be important for clinical applications. We used RNA-sequencing data from tumor samples, and measured the level of ITH in terms of biological network states. To model complex relationships among genes, we used a protein interaction network to consider gene-gene dependency.

View Article and Find Full Text PDF

A breast cancer subtype classification scheme, PAM50, based on genetic information is widely accepted for clinical applications. On the other hands, experimental cancer biology studies have been successful in revealing the mechanisms of breast cancer and now the hallmarks of cancer have been determined to explain the core mechanisms of tumorigenesis. Thus, it is important to understand how the breast cancer subtypes are related to the cancer core mechanisms, but multiple studies are yet to address the hallmarks of breast cancer subtypes.

View Article and Find Full Text PDF

Motivation: Transcriptome data from the gene knockout experiment in mouse is widely used to investigate functions of genes and relationship to phenotypes. When a gene is knocked out, it is important to identify which genes are affected by the knockout gene. Existing methods, including differentially expressed gene (DEG) methods, can be used for the analysis.

View Article and Find Full Text PDF

Oxidized low density lipoproteins (Ox-LDLs) have an important role in the development of age-related vascular disease, such as atherosclerosis. Ox-LDLs are defined as oxidatively modified LDLs in the blood by enzymatic or non-enzymatic oxidation of phospholipids (PLs). For the characterization of Ox-LDLs at molecular level, oxidation patterns of oxidized PL (Ox-PL) products were systematically examined with standard PL molecules (16:0/22:6-PC, 18:0/22:6-PA, and 18:0/22:6-PG), by the formation of bilayer vesicles of each standard, followed by oxidation of PL vesicles using a Cu(2+) solution.

View Article and Find Full Text PDF

This study demonstrates the potential utility of on-line chip-type asymmetrical flow field-flow fractionation (cAF4) and electrospray ionization tandem mass spectrometry (ESI-MS-MS) for the top-down lipidomic analysis of human lipoproteins. Utilizing a cAF4, which is a miniaturized AF4 channel operated with a micro flow rate regime, enabled high density lipoprotein (HDL) and low density lipoprotein (LDL) to be separated by hydrodynamic diameter in an aqueous solution with the simultaneous desalting of lipoproteins. On-line desalting was found to enhance the ionization of lipoproteinic lipid molecules during the feeding of cAF4 effluent to ESI-MS when compared to the direct infusion of lipoproteins to MS.

View Article and Find Full Text PDF