Knowledge graphs, powerful tools that explicitly transfer knowledge to machines, have significantly advanced new knowledge inferences. Discovering unknown relationships between diseases and genes/proteins in biomedical knowledge graphs can lead to the identification of disease development mechanisms and new treatment targets. Generating high-quality representations of biomedical entities is essential for successfully predicting disease-gene/protein associations.
View Article and Find Full Text PDFThe advent of artificial intelligence (AI) has enabled a comprehensive exploration of materials for various applications. However, AI models often prioritize frequently encountered material examples in the scientific literature, limiting the selection of suitable candidates based on inherent physical and chemical attributes. To address this imbalance, we generated a dataset consisting of 1,453,493 natural language-material narratives from OQMD, Materials Project, JARVIS, and AFLOW2 databases based on ab initio calculation results that are more evenly distributed across the periodic table.
View Article and Find Full Text PDFMass spectra, which are agglomerations of ionized fragments from targeted molecules, play a crucial role across various fields for the identification of molecular structures. A prevalent analysis method involves spectral library searches, where unknown spectra are cross-referenced with a database. The effectiveness of such search-based approaches, however, is restricted by the scope of the existing mass spectra database, underscoring the need to expand the database via mass spectra prediction.
View Article and Find Full Text PDFIn recent years, molecular representation learning has emerged as a key area of focus in various chemical tasks. However, many existing models fail to fully consider the geometric information on molecular structures, resulting in less intuitive representations. Moreover, the widely used message passing mechanism is limited to providing the interpretation of experimental results from a chemical perspective.
View Article and Find Full Text PDFAmyloid positivity is an early indicator of Alzheimer's disease and is necessary to determine the disease. In this study, a deep generative model is utilized to predict the amyloid positivity of cognitively normal individuals using proxy measures, such as structural MRI scans, demographic variables, and cognitive scores, instead of invasive direct measurements. Through its remarkable efficacy in handling imperfect datasets caused by missing data or labels, and imbalanced classes, the model outperforms previous studies and widely used machine learning approaches with an AUROC of 0.
View Article and Find Full Text PDFThe detection of somatic DNA variants in tumor samples with low tumor purity or sequencing depth remains a daunting challenge despite numerous attempts to address this problem. In this study, we constructed a substantially extended set of actual positive variants originating from a wide range of tumor purities and sequencing depths, as well as actual negative variants derived from sequencer-specific sequencing errors. A deep learning model named AIVariant, trained on this extended dataset, outperforms previously reported methods when tested under various tumor purities and sequencing depths, especially low tumor purity and sequencing depth.
View Article and Find Full Text PDFBackground: Although coronary computed tomography angiography (CCTA) is currently utilized as the frontline test to accurately diagnose coronary artery disease (CAD) in clinical practice, there are still debates regarding its use as a screening tool for the asymptomatic population. Using deep learning (DL), we sought to develop a prediction model for significant coronary artery stenosis on CCTA and identify the individuals who would benefit from undergoing CCTA among apparently healthy asymptomatic adults.
Methods: We retrospectively reviewed 11,180 individuals who underwent CCTA as part of routine health check-ups between 2012 and 2019.
Recently, many electrocardiogram (ECG) classification algorithms using deep learning have been proposed. Because the ECG characteristics vary across datasets owing to variations in factors such as recorded hospitals and the race of participants, the model needs to have a consistently high generalization performance across datasets. In this study, as part of the PhysioNet/Computing in Cardiology Challenge (PhysioNet Challenge) 2021, we present a model to classify cardiac abnormalities from the 12- and the reduced-lead ECGs.
View Article and Find Full Text PDFFront Comput Neurosci
November 2022
Backpropagation has been regarded as the most favorable algorithm for training artificial neural networks. However, it has been criticized for its biological implausibility because its learning mechanism contradicts the human brain. Although backpropagation has achieved super-human performance in various machine learning applications, it often shows limited performance in specific tasks.
View Article and Find Full Text PDFSilent communication based on biosignals from facial muscle requires accurate detection of its directional movement and thus optimally positioning minimum numbers of sensors for higher accuracy of speech recognition with a minimal person-to-person variation. So far, previous approaches based on electromyogram or pressure sensors are ineffective in detecting the directional movement of facial muscles. Therefore, in this study, high-performance strain sensors are used for separately detecting - and -axis strain.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
March 2024
Obtaining accurate pixel-level localization from class labels is a crucial process in weakly supervised semantic segmentation and object localization. Attribution maps from a trained classifier are widely used to provide pixel-level localization, but their focus tends to be restricted to a small discriminative region of the target object. An AdvCAM is an attribution map of an image that is manipulated to increase the classification score produced by a classifier before the final softmax or sigmoid layer.
View Article and Find Full Text PDFA molecule is a complex of heterogeneous components, and the spatial arrangements of these components determine the whole molecular properties and characteristics. With the advent of deep learning in computational chemistry, several studies have focused on how to predict molecular properties based on molecular configurations. MA message-passing neural network provides an effective framework for capturing molecular geometric features with the perspective of a molecule as a graph.
View Article and Find Full Text PDFPurpose: To develop and evaluate the performance of a polygenic risk score (PRS) constructed in a Korean male population to predict clinically significant prostate cancer (csPCa).
Materials And Methods: Total 2,702 PCa samples and 7,485 controls were used to discover csPCa susceptible single nucleotide polymorphisms (SNPs). Males with biopsy-proven or post-radical prostatectomy Gleason score 7 or higher were included for analysis.
Bioinformatics
January 2022
Motivation: MicroRNAs (miRNAs) play pivotal roles in gene expression regulation by binding to target sites of messenger RNAs (mRNAs). While identifying functional targets of miRNAs is of utmost importance, their prediction remains a great challenge. Previous computational algorithms have major limitations.
View Article and Find Full Text PDFAlthough prime editing is a promising genome editing method, the efficiency of prime editor 2 (PE2) is often insufficient. Here we generate a more efficient variant of PE2, named hyPE2, by adding the Rad51 DNA-binding domain. When tested at endogenous sites, hyPE2 shows a median of 1.
View Article and Find Full Text PDFHeat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations.
View Article and Find Full Text PDFModels from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability.
View Article and Find Full Text PDFDNA has not been utilized to record temporal information, although DNA has been used to record biological information and to compute mathematical problems. Here, we found that indel generation by Cas9 and guide RNA can occur at steady rates, in contrast to typical dynamic biological reactions, and the accumulated indel frequency can be a function of time. By measuring indel frequencies, we developed a method for recording and measuring absolute time periods over hours to weeks in mammalian cells.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
August 2022
Learning classifiers with imbalanced data can be strongly biased toward the majority class. To address this issue, several methods have been proposed using generative adversarial networks (GANs). Existing GAN-based methods, however, do not effectively utilize the relationship between a classifier and a generator.
View Article and Find Full Text PDFPurpose: Multigene assays provide useful prognostic information regarding hormone receptor (HR)-positive breast cancer. Next-generation sequencing (NGS)-based platforms have numerous advantages including reproducibility and adaptability in local laboratories. This study aimed to develop and validate an NGS-based multigene assay to predict the distant recurrence risk.
View Article and Find Full Text PDFAnnu Int Conf IEEE Eng Med Biol Soc
July 2020
Heart disease and stroke are the leading causes of death worldwide. High blood pressure greatly increases the risk of heart disease and stroke. Therefore, it is important to control blood pressure (BP) through regular BP monitoring; as such, it is necessary to develop a method to accurately and conveniently predict BP in a variety of settings.
View Article and Find Full Text PDFPrime editing enables the introduction of virtually any small-sized genetic change without requiring donor DNA or double-strand breaks. However, evaluation of prime editing efficiency requires time-consuming experiments, and the factors that affect efficiency have not been extensively investigated. In this study, we performed high-throughput evaluation of prime editor 2 (PE2) activities in human cells using 54,836 pairs of prime editing guide RNAs (pegRNAs) and their target sequences.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
April 2022
Recent advances in next-generation sequencing technologies have led to the successful insertion of video information into DNA using synthesized oligonucleotides. Several attempts have been made to embed larger data into living organisms. This process of embedding messages is called steganography and it is used for hiding and watermarking data to protect intellectual property.
View Article and Find Full Text PDF