Publications by authors named "Giltae Song"

Article Synopsis
  • Complex structural variations (cxSVs) are often missed in genome studies due to difficulties in detection, but ARC-SV provides a machine-learning method to accurately identify and reconstruct them from standard genetic datasets.
  • * Our research identified cxSVs as key contributors to human genetic diversity, showing that rare cxSVs frequently occur in genes related to neural functions and rapid human evolution.
  • * Through advanced analysis of brain tissue, we found cxSVs are linked to gene expression changes and are associated with psychiatric disorders, suggesting they play a significant role in the development of neuropsychiatric conditions.
View Article and Find Full Text PDF

Purpose: To represent 24-2 visual field (VF) losses of individual patients using a hybrid approach of archetypal analysis (AA) and fuzzy c-means (FCM) clustering.

Methods: In this multicenter retrospective study, we classified characteristic patterns of 24-2 VF using AA and decomposed them with FCM clustering. We predicted the change in mean deviation (MD) through supervised machine learning from decomposition coefficient change.

View Article and Find Full Text PDF
Article Synopsis
  • The study aims to develop a reliable and interpretable AI system called RIAS to predict mortality in patients after an acute myocardial infarction (AMI), filling the gap in existing tools for clinicians.
  • RIAS uses data from the Korean Acute Myocardial Infarction Registry and outperforms traditional models like XGBoost in sensitivity, while also providing important insights into the effects of medications and age on patient mortality.
  • The framework addresses AI's "black-box" problem by offering clear explanations for predictions, including "what if" scenarios that help clinicians make better, patient-specific treatment decisions.
View Article and Find Full Text PDF

Background: Aptamers, which are biomaterials comprised of single-stranded DNA/RNA that form tertiary structures, have significant potential as next-generation materials, particularly for drug discovery. The systematic evolution of ligands by exponential enrichment (SELEX) method is a critical in vitro technique employed to identify aptamers that bind specifically to target proteins. While advanced SELEX-based methods such as Cell- and HT-SELEX are available, they often encounter issues such as extended time consumption and suboptimal accuracy.

View Article and Find Full Text PDF

Motivation: Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures.

Results: Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields.

View Article and Find Full Text PDF

Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples.

View Article and Find Full Text PDF

Motivation: Over the past decades, vast amounts of genome sequencing data have been produced, requiring an enormous level of storage capacity. The time and resources needed to store and transfer such data cause bottlenecks in genome sequencing analysis. To resolve this issue, various compression techniques have been proposed to reduce the size of original FASTQ raw sequencing data, but these remain suboptimal.

View Article and Find Full Text PDF

Oligonucleotide-based aptamers, which have a three-dimensional structure with a single-stranded fragment, feature various characteristics with respect to size, toxicity, and permeability. Accordingly, aptamers are advantageous in terms of diagnosis and treatment and are materials that can be produced through relatively simple experiments. Systematic evolution of ligands by exponential enrichment (SELEX) is one of the most widely used experimental methods for generating aptamers; however, it is highly expensive and time-consuming.

View Article and Find Full Text PDF

Background: Recently, the possibility of tumour classification based on genetic data has been investigated. However, genetic datasets are difficult to handle because of their massive size and complexity of manipulation. In the present study, we examined the diagnostic performance of machine learning applications using imaging-based classifications of oral squamous cell carcinoma (OSCC) gene sets.

View Article and Find Full Text PDF

Background: Macaque species share >93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g., HIV and neurodegenerative diseases).

View Article and Find Full Text PDF

ChIP-seq is one of the core experimental resources available to understand genome-wide epigenetic interactions and identify the functional elements associated with diseases. The analysis of ChIP-seq data is important but poses a difficult computational challenge, due to the presence of irregular noise and bias on various levels. Although many peak-calling methods have been developed, the current computational tools still require, in some cases, human manual inspection using data visualization.

View Article and Find Full Text PDF

Background: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later.

View Article and Find Full Text PDF

HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective.

View Article and Find Full Text PDF

K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed.

View Article and Find Full Text PDF

One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query () which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database.

View Article and Find Full Text PDF

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation.

View Article and Find Full Text PDF

In recent years, thousands of Saccharomyces cerevisiae genomes have been sequenced to varying degrees of completion. The Saccharomyces Genome Database (SGD) has long been the keeper of the original eukaryotic reference genome sequence, which was derived primarily from S. cerevisiae strain S288C.

View Article and Find Full Text PDF

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation.

View Article and Find Full Text PDF

The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated.

View Article and Find Full Text PDF

Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology.

View Article and Find Full Text PDF

Background: Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments.

View Article and Find Full Text PDF

Background: Gene clusters are genetically important, but their analysis poses significant computational challenges. One of the major reasons for these difficulties is gene conversion among the duplicated regions of the cluster, which can obscure their true relationships. Many computational methods for detecting gene conversion events have been released, but their performance has not been assessed for wide deployment in evolutionary history studies due to a lack of accurate evaluation methods.

View Article and Find Full Text PDF

Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. These clusters are one of the major sources of evolutionary innovation, and they are linked to multiple diseases, including HIV and a variety of cancers. Understanding their evolutionary histories is a key to the application of comparative genomics methods in these regions of the genome.

View Article and Find Full Text PDF

Much important evolutionary activity occurs in gene clusters, where a copy of a gene may be free to acquire new functions. Current computational methods to extract evolutionary information from sequence data for such clusters are suboptimal, in part because accurate sequence data are often lacking in these genomic regions, making existing methods difficult to apply. We describe a new method for reconstructing the recent evolutionary history of gene clusters, and evaluate its performance on both simulated data and actual human gene clusters.

View Article and Find Full Text PDF