High-quality gene/disease embedding in a multi-relational heterogeneous graph after a joint matrix/tensor decomposition.

J Biomed Inform

Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China. Electronic address:

Published: February 2022

Motivation: Node embedding of biological entity network has been widely investigated for the downstream application scenarios. To embed full semantics of gene and disease, a multi-relational heterogeneous graph is considered in a scenario where uni-relation between gene/disease and other heterogeneous entities are abundant while multi-relation between gene and disease is relatively sparse. After introducing this novel graph format, it is illuminative to design a specific data integration algorithm to fully capture the graph information and bring embeddings with high quality.

Results: First, a typical multi-relational triple dataset was introduced, which carried significant association between gene and disease. Second, we curated all human genes and diseases in seven mainstream datasets and constructed a large-scale gene-disease network, which compromising 163,024 nodes and 25,265,607 edges, and relates to 27,165 genes, 2,665 diseases, 15,067 chemicals, 108,023 mutations, 2,363 pathways, and 7.732 phenotypes. Third, we proposed a Joint Decomposition of Heterogeneous Matrix and Tensor (JDHMT) model, which integrated all heterogeneous data resources and obtained embedding for each gene or disease. Forth, a visualized intrinsic evaluation was performed, which investigated the embeddings in terms of interpretable data clustering. Furthermore, an extrinsic evaluation was performed in the form of linking prediction. Both intrinsic and extrinsic evaluation results showed that JDHMT model outperformed other eleven state-of-the-art (SOTA) methods which are under relation-learning, proximity-preserving or message-passing paradigms. Finally, the constructed gene-disease network, embedding results and codes were made available.

Data And Codes Availability: The constructed massive gene-disease network is available at: https://hzaubionlp.com/heterogeneous-biological-network/. The codes are available at: https://github.com/bionlp-hzau/JDHMT.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2021.103973DOI Listing

Publication Analysis

Top Keywords

gene disease
16
gene-disease network
12
multi-relational heterogeneous
8
heterogeneous graph
8
jdhmt model
8
evaluation performed
8
extrinsic evaluation
8
heterogeneous
5
high-quality gene/disease
4
embedding
4

Similar Publications

Identification and characterization of a novel QTL for barley yellow mosaic disease resistance from bulbous barley.

Plant Genome

March 2025

Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops/Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou, China.

Winter barley (Hordeum vulgare) production areas in the middle and lower reaches of the Yangtze River are severely threatened by barley yellow mosaic disease, which is caused by Barley yellow mosaic virus and Barley mild mosaic virus. Improving barley disease resistance in breeding programs requires knowledge of genetic loci in germplasm resources. In this study, bulked segregant analysis (BSA) identified a novel major quantitative trait loci (QTL) QRym.

View Article and Find Full Text PDF

Surface flow of freshwater on Adriatic islands is rare due to the extreme permeability of the karst terrain. Hence, most helminthological studies of freshwater fishes in the Adriatic drainage have focused on mainland freshwater systems, while data from islands are scarce. We collected minnow, (Schinz, 1840), specimens in the Suha Ričina stream on Krk Island and screened them for helminth ectoparasites.

View Article and Find Full Text PDF

Objective: To analyze the clinical characteristics and molecular biomarkers of adult T-cell lymphoblastic lymphoma (T-LBL) to identify prognostic factors, and to evaluate the efficacy of different chemotherapy regimens, providing a basis for optimizing treatment strategies for T-LBL.

Methods: A total of 89 Patients aged 18-72 years with T-LBL, confirmed via histopathological examination of lymph nodes, extranodal tissues, or bone marrow, were retrospectively included. Clinical data, treatment details, and mutational profiles were collected.

View Article and Find Full Text PDF

Calcineurin inhibitors (CNIs) are indispensable immunosuppressants for transplant recipients and patients with autoimmune diseases, but chronic use causes nephrotoxicity, including kidney fibrosis. Why inhibiting calcineurin, a serine/threonine phosphatase, causes kidney fibrosis remains unknown. We performed single-nucleus RNA sequencing of the kidney from a chronic CNI nephrotoxicity mouse model and found an increased proportion of injured proximal tubule cells, which exhibited altered expression of genes associated with oxidative phosphorylation, cellular senescence and fibrosis.

View Article and Find Full Text PDF

Replication timing (RT) allows us to analyze temporal patterns of genome-wide replication, i.e., if genes replicate early or late during the S-phase of the cell cycle.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!