Publications by authors named "Masaaki Kotera"

The design of RNA plays a crucial role in developing RNA vaccines, nucleic acid therapeutics, and innovative biotechnological tools. However, existing techniques frequently lack versatility across various tasks and are dependent on pre-defined secondary structure or other prior knowledge. To address these limitations, we introduce GenerRNA, a Transformer-based model inspired by the success of large language models (LLMs) in protein and molecule generation.

View Article and Find Full Text PDF

Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR/QSPR will be limited by the machine's inability to interpret the predictions to researchers.

View Article and Find Full Text PDF

The development of novel organic compounds with desired properties is time consuming and costly. Thus, the quantitative structure-property relationship (QSPR) model is used widely for efficiently discovering compounds with the desired properties. Novel structures can be generated from a variety of input structures in silico by structure generators.

View Article and Find Full Text PDF
Article Synopsis
  • The 2015 BioHackathon brought together scientists and developers to create tools for sharing and reusing biological data.
  • They talked about problems with how to represent and use different kinds of biological information, like DNA and proteins.
  • The group shared their progress in fixing these issues and discussed future goals to improve how researchers can use biological data in their work.
View Article and Find Full Text PDF

Background: Natural products are the source of various functional materials such as medicines, and understanding their biosynthetic pathways can provide information that is helpful for their effective production through the synthetic biology approach. A number of studies have aimed to predict biosynthetic pathways from their chemical structures in a retrosynthesis manner; however, sometimes the calculation finishes without reaching the starting material from the target molecule. In order to address this problem, the method to find suitable starting materials is required.

View Article and Find Full Text PDF

Cucurbitacins are highly oxygenated triterpenoids characteristic of plants in the family Cucurbitaceae and responsible for the bitter taste of these plants. Fruits of bitter melon () contain various cucurbitacins possessing an unusual ether bridge between C5 and C19, not observed in other Cucurbitaceae members. Using a combination of next-generation sequencing and RNA-Seq analysis and gene-to-gene co-expression analysis with the ConfeitoGUIplus software, we identified three P450 genes, , , and , expected to be involved in cucurbitacin biosynthesis.

View Article and Find Full Text PDF

Cytochrome P450 (CYP) is an enzyme family that plays a crucial role in metabolism, mainly metabolizing xenobiotics to produce non-toxic structures, however, some metabolized products can cause hepatotoxicity. Hence, predicting the structures of CYP products is an important task in designing non-hepatotoxic drugs. Here, we have developed novel atomic descriptors to predict the sites of metabolism (SoM) in CYP substrates.

View Article and Find Full Text PDF

Background: Characterization of drug-protein interaction networks with biological features has recently become challenging in recent pharmaceutical science toward a better understanding of polypharmacology.

Results: We present a novel method for systematic analyses of the underlying features characteristic of drug-protein interaction networks, which we call "drug-protein interaction signatures" from the integration of large-scale heterogeneous data of drugs and proteins. We develop a new efficient algorithm for extracting informative drug-protein interaction signatures from the integration of large-scale heterogeneous data of drugs and proteins, which is made possible by space-efficient representations for fingerprints of drug-protein pairs and sparsity-induced classifiers.

View Article and Find Full Text PDF

Membranolytic anticancer peptides (ACPs) are drawing increasing attention as potential future therapeutics against cancer, due to their ability to hinder the development of cellular resistance and their potential to overcome common hurdles of chemotherapy, e.g., side effects and cytotoxicity.

View Article and Find Full Text PDF

Small molecules can be represented in various file formats, (1) one-line systems such as SMILES (Simplified Molecular Input Line Entry System) and InChI (International Chemical Identifier) and (2) table systems such as the molfiles, SDF (Structure Data File), and KCF (KEGG Chemical Function). KCF and KCF-S (KEGG Chemical Function-and-Substructures) apply physicochemical property labels on the representations of small molecules, and contribute to improved analysis of compound-protein networks including drug-target interaction, and compound-compound networks including metabolic pathways. In this chapter, the main concepts, usage, and some example applications of the KCFCO and KCF-S packages are explained.

View Article and Find Full Text PDF

Although host-plant selection is a central topic in ecology, its general underpinnings are poorly understood. Here, we performed a case study focusing on the publicly available data on Japanese butterflies. A combined statistical analysis of plant-herbivore relationships and taxonomy revealed that some butterfly subfamilies in different families feed on the same plant families, and the occurrence of this phenomenon more than just by chance, thus indicating the independent acquisition of adaptive phenotypes to the same hosts.

View Article and Find Full Text PDF

The identification of the modes of action of bioactive compounds is a major challenge in chemical systems biology of diseases. Genome-wide expression profiling of transcriptional responses to compound treatment for human cell lines is a promising unbiased approach for the mode-of-action analysis. Here we developed a novel approach to elucidate the modes of action of bioactive compounds in a cell-specific manner using large-scale chemically-induced transcriptome data acquired from the Library of Integrated Network-based Cellular Signatures (LINCS), and analyzed 16,268 compounds and 68 human cell lines.

View Article and Find Full Text PDF

Metabolic pathway reconstruction presents a challenge for understanding metabolic pathways in organisms of interest. Different strategies, , reference-based vs. , must be used for pathway reconstruction depending on the availability of well-characterized enzymatic reactions.

View Article and Find Full Text PDF

Motivation: Metabolic pathways are an important class of molecular networks consisting of compounds, enzymes and their interactions. The understanding of global metabolic pathways is extremely important for various applications in ecology and pharmacology. However, large parts of metabolic pathways remain unknown, and most organism-specific pathways contain many missing enzymes.

View Article and Find Full Text PDF

Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies that estimate the number of candidate enzyme genes, these studies required some additional information aside from the structures of metabolites such as gene expression and order in the genome.

View Article and Find Full Text PDF

The identification of beneficial drug combinations is a challenging issue in pharmaceutical and clinical research toward combinatorial drug therapy. In the present study, we developed a novel computational method for large-scale prediction of beneficial drug combinations using drug efficacy and target profiles. We designed an informative descriptor for each drug-drug pair based on multiple drug profiles representing drug-targeted proteins and Anatomical Therapeutic Chemical Classification System codes.

View Article and Find Full Text PDF

Cancer is not rare anywhere in the world now, and the global burden of cancer continues to increase largely every year. Previous research on infections and cancers reported that, about 17.8 % of the cancers worldwide, which are over 1.

View Article and Find Full Text PDF

Motivation: Recent advances in mass spectrometry and related metabolomics technologies have enabled the rapid and comprehensive analysis of numerous metabolites. However, biosynthetic and biodegradation pathways are only known for a small portion of metabolites, with most metabolic pathways remaining uncharacterized.

Results: In this study, we developed a novel method for supervised de novo metabolic pathway reconstruction with an improved graph alignment-based approach in the reaction-filling framework.

View Article and Find Full Text PDF

The identification of drug-target interactions, or interactions between drug candidate compounds and target candidate proteins, is a crucial process in genomic drug discovery. In silico chemogenomic methods are recently recognized as a promising approach for genome-wide scale prediction of drug-target interactions, but the prediction performance depends heavily on the descriptors and similarity measures of drugs and proteins. In this paper, we investigated the performance of various descriptors and similarity measures of drugs and proteins for the drug-target interaction prediction using a chemogenomic approach.

View Article and Find Full Text PDF

Genomics is faced with the issue of many partially annotated putative enzyme-encoding genes for which activities have not yet been verified, while metabolomics is faced with the issue of many putative enzyme reactions for which full equations have not been verified. Knowledge of enzymes has been collected by IUBMB, and has been made public as the Enzyme List. To date, however, the terminology of the Enzyme List has not been assessed comprehensively by bioinformatics studies.

View Article and Find Full Text PDF

Motivation: Metabolic pathway analysis is crucial not only in metabolic engineering but also in rational drug design. However, the biosynthetic/biodegradation pathways are known only for a small portion of metabolites, and a vast amount of pathways remain uncharacterized. Therefore, an important challenge in metabolomics is the de novo reconstruction of potential reaction networks on a metabolome-scale.

View Article and Find Full Text PDF

In recent years, the Semantic Web has become the focus of life science database development as a means to link life science data in an effective and efficient manner. In order for carbohydrate data to be applied to this new technology, there are two requirements for carbohydrate data representations: (1) a linear notation which can be used as a URI (Uniform Resource Identifier) if needed and (2) a unique notation such that any published glycan structure can be represented distinctively. This latter requirement includes the possible representation of nonstandard monosaccharide units as a part of the glycan structure, as well as compositions, repeating units, and ambiguous structures where linkages/linkage positions are unidentified.

View Article and Find Full Text PDF
Article Synopsis
  • - DINIES is a web server that predicts unknown drug-target interactions using various biological data, like chemical structures and amino acid sequences, leveraging supervised machine learning methods.
  • - The server allows users to upload similarity matrices of drugs and proteins and choose between known KEGG database interactions or their own data for training predictive models.
  • - DINIES also offers integration with the KEGG database for analysis of biological pathways, functional hierarchies, and diseases, making it a valuable tool for researchers.
View Article and Find Full Text PDF

Background: Most phenotypic effects of drugs are involved in the interactions between drugs and their target proteins, however, our knowledge about the molecular mechanism of the drug-target interactions is very limited. One of challenging issues in recent pharmaceutical science is to identify the underlying molecular features which govern drug-target interactions.

Results: In this paper, we make a systematic analysis of the correlation between drug side effects and protein domains, which we call "pharmacogenomic features," based on the drug-target interaction network.

View Article and Find Full Text PDF

Background: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e.

View Article and Find Full Text PDF