Publications by authors named "Hisashi Kashima"

In complex systems, it's crucial to uncover latent mechanisms and their context-dependent relationships. This is especially true in medical research, where identifying unknown cancer mechanisms and their impact on phenomena like drug resistance is vital. Directly observing these mechanisms is challenging due to measurement complexities, leading to an approach that infers latent mechanisms from observed variable distributions.

View Article and Find Full Text PDF

Predicting the chemical properties of compounds is crucial in discovering novel materials and drugs with specific desired characteristics. Recent significant advances in machine learning technologies have enabled automatic predictive modeling from past experimental data reported in the literature. However, these datasets are often biased because of various reasons, such as experimental plans and publication decisions, and the prediction models trained using such biased datasets often suffer from over-fitting to the biased distributions and perform poorly on subsequent uses.

View Article and Find Full Text PDF

Recently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries.

View Article and Find Full Text PDF

Background: Predicting of chemical compounds is one of the fundamental tasks in bioinformatics and chemoinformatics, because it contributes to various applications in metabolic engineering and drug discovery. The recent rapid growth of the amount of available data has enabled applications of computational approaches such as statistical modeling and machine learning method. Both a set of chemical interactions and chemical compound structures are represented as graphs, and various graph-based approaches including graph convolutional neural networks have been successfully applied to chemical network prediction.

View Article and Find Full Text PDF

Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One solution to this problem is to utilise the power of crowdsourcing.

View Article and Find Full Text PDF

Synthetic accessibility evaluation is a process to assess the ease of synthesis of compounds. A rapid method for the assessment of synthetic accessibility for a vast number of chemical compounds is expected to bring about a breakthrough in the drug discovery. Although several computational methods have been proposed, the compound evaluation has still been processed by medicinal chemists; however, the low throughput of the human evaluation due to the lack of chemists is a critical issue for handling a large number of compounds.

View Article and Find Full Text PDF

Well-trained clinicians may be able to provide diagnosis and prognosis from very short biomarker series using information and experience gained from previous patients. Although mathematical methods can potentially help clinicians to predict the progression of diseases, there is no method so far that estimates the patient state from very short time-series of a biomarker for making diagnosis and/or prognosis by employing the information of previous patients. Here, we propose a mathematical framework for integrating other patients' datasets to infer and predict the state of the disease in the current patient based on their short history.

View Article and Find Full Text PDF

Background: The prevalence of non-communicable diseases is increasing throughout the world, including developing countries.

Objective: The intent was to conduct a study of a preventive medical service in a developing country, combining eHealth checkups and teleconsultation as well as assess stratification rules and the short-term effects of intervention.

Methods: We developed an eHealth system that comprises a set of sensor devices in an attaché case, a data transmission system linked to a mobile network, and a data management application.

View Article and Find Full Text PDF

Accurate prediction of protein-ligand binding affinities for lead optimization in drug discovery remains an important and challenging problem on scoring functions for docking simulation. In this paper, we propose a data-driven approach that integrates multiple scoring functions to predict protein-ligand binding affinity directly. We then propose a new method called multiple instance regression based scoring (MIRS) that incorporates unbound ligand conformations using multiple scoring functions.

View Article and Find Full Text PDF

Background: High-throughput methods for detecting protein-protein interactions enable us to obtain large interaction networks, and also allow us to computationally identify the associations of proteins as protein complexes. Although there are methods to extract protein complexes as sets of proteins from interaction networks, the extracted complexes may include false positives because they do not account for the structural limitations of the proteins and thus do not check that the proteins in the extracted complex can simultaneously bind to each other. In addition, there have been few searches for deeper insights into the protein complexes, such as of the topology of the protein-protein interactions or into the domain-domain interactions that mediate the protein interactions.

View Article and Find Full Text PDF

Background: Understanding of secondary metabolic pathway in plant is essential for finding druggable candidate enzymes. However, there are many enzymes whose functions are not yet discovered in organism-specific metabolic pathways. Towards identifying the functions of those enzymes, assignment of EC numbers to the enzymatic reactions they catalyze plays a key role, since EC numbers represent the categorization of enzymes on one hand, and the categorization of enzymatic reactions on the other hand.

View Article and Find Full Text PDF

Motivation: The existing supervised methods for biological network inference work on each of the networks individually based only on intra-species information such as gene expression data. We believe that it will be more effective to use genomic data and cross-species evolutionary information from different species simultaneously, rather than to use the genomic data alone.

Results: We created a new semi-supervised learning method called Link Propagation for inferring biological networks of multiple species based on genome-wide data and evolutionary information.

View Article and Find Full Text PDF

We propose a novel general-purpose tree kernel and apply it to glycan structure analysis. Our kernel measures the similarity between two labeled trees by counting the number of common q-length substrings (tree q-grams) embedded in the trees for all possible lengths q. We apply our tree kernel using a support vector machine (SVM) to classification and specific feature extraction from glycan structure data.

View Article and Find Full Text PDF

Motivation: Clustering sequences of a full-length cDNA library into alternative splice form candidates is a very important problem.

Results: We developed a new efficient algorithm to cluster sequences of a full-length cDNA library into alternative splice form candidates. Current clustering algorithms for cDNAs tend to produce too many clusters containing incorrect splice form candidates.

View Article and Find Full Text PDF