DIVIS: a semantic DIstance to improve the VISualisation of heterogeneous phenotypic datasets.

Rayan Eid Claudine Landès Alix Pernet Emmanuel Benoît Pierre Santagostini Angelina El Ghaziri Julie Bourbeillon

BioData Min

Institut Agro, Univ Angers, INRAE, IRHS, SFR QuaSaV, Angers, 49000, France.

Published: April 2022

Background: Thanks to the wider spread of high-throughput experimental techniques, biologists are accumulating large amounts of datasets which often mix quantitative and qualitative variables and are not always complete, in particular when they regard phenotypic traits. In order to get a first insight into these datasets and reduce the data matrices size scientists often rely on multivariate analysis techniques. However such approaches are not always easily practicable in particular when faced with mixed datasets. Moreover displaying large numbers of individuals leads to cluttered visualisations which are difficult to interpret.

Results: We introduced a new methodology to overcome these limits. Its main feature is a new semantic distance tailored for both quantitative and qualitative variables which allows for a realistic representation of the relationships between individuals (phenotypic descriptions in our case). This semantic distance is based on ontologies which are engineered to represent real-life knowledge regarding the underlying variables. For easier handling by biologists, we incorporated its use into a complete tool, from raw data file to visualisation. Following the distance calculation, the next steps performed by the tool consist in (i) grouping similar individuals, (ii) representing each group by emblematic individuals we call archetypes and (iii) building sparse visualisations based on these archetypes. Our approach was implemented as a Python pipeline and applied to a rosebush dataset including passport and phenotypic data.

Conclusions: The introduction of our new semantic distance and of the archetype concept allowed us to build a comprehensive representation of an incomplete dataset characterised by a large proportion of qualitative data. The methodology described here could have wider use beyond information characterizing organisms or species and beyond plant science. Indeed we could apply the same approach to any mixed dataset.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8981856	PMC
http://dx.doi.org/10.1186/s13040-022-00293-y	DOI Listing

Publication Analysis

Top Keywords

semantic distance

quantitative qualitative

qualitative variables

distance

divis semantic

distance improve

improve visualisation

visualisation heterogeneous

phenotypic

heterogeneous phenotypic

Similar Publications

Utility of word embeddings from large language models in medical diagnosis.

J Am Med Inform Assoc

January 2025

Kennewick, WA 99338, United States.

Shahram Yazdani Ronald Claude Henry Avery Byrne Isaac Claude Henry

Objective: This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding ("eponymic condition") and the mean of all symptom embeddings associated with a disease ("ensemble mean").

Materials And Methods: Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space.

View Article and Find Full Text PDF

Similar Publications

HKAN: A Hybrid Kolmogorov-Arnold Network for Robust Fabric Defect Segmentation.

Sensors (Basel)

December 2024

School of Computer and Artificial Intelligence, Wuhan Textile Unversity, Wuhan 430200, China.

Min Li Pei Ye Shuqin Cui Ping Zhu Junping Liu

Currently, fabric defect detection methods predominantly rely on CNN models. However, due to the inherent limitations of CNNs, such models struggle to capture long-distance dependencies in images and fail to accurately detect complex defect features. While Transformers excel at modeling long-range dependencies, their quadratic computational complexity poses significant challenges.

View Article and Find Full Text PDF

Similar Publications

The structure of meaning in schizophrenia: A study of spontaneous speech in Chinese.

Psychiatry Res

December 2024

Department of Translation and Language Sciences, Universitat Pompeu Fabra, Barcelona, Spain; Catalan Institute for Advanced Studies and Research (ICREA), Barcelona, Spain.

Han Zhang Rui He Claudio Palominos Ning Hsu Hintat Cheung

Narrative speech production requires the retrieval of concepts to refer to entities, which need to be referenceable more than once for any form of narrative coherence to arise. Such coherence has long been observed to be affected in schizophrenia spectrum disorders (SSD), yet the underlying mechanisms have been a longstanding puzzle, with existing evidence predominantly derived from Indo-European languages. Here we analyzed two picture descriptions from 22 native Mandarin Chinese speakers with SSD and 15 healthy controls.

View Article and Find Full Text PDF

Similar Publications

Neighborhood Topology-Aware Knowledge Graph Learning and Microbial Preference Inferring for Drug-Microbe Association Prediction.

J Chem Inf Model

January 2025

Department of Computer Science and Technology, Shantou University, Shantou 515063, China.

Jing Gu Tiangang Zhang Yihang Gao Sentao Chen Yuxin Zhang

The human microbiota may influence the effectiveness of drug therapy by activating or inactivating the pharmacological properties of drugs. Computational methods have demonstrated their ability to screen reliable microbe-drug associations and uncover the mechanism by which drugs exert their functions. However, the previous prediction methods failed to completely exploit the neighborhood topologies of the microbe and drug entities and the diverse correlations between the microbe-drug entity pair and the other entities.

View Article and Find Full Text PDF

Similar Publications

Robust multi-source geographic entities matching by maximizing geometric and semantic similarity.

Sci Rep

December 2024

Department of Geographic Information System, Chinese Academy of Surveying and mapping, Beijing, 100036, China.

YuHan Yan PengDa Wu Yong Yin PeiPei Guo

Geographic entity matching is an important means for multi-source spatial data fusion and information association and sharing. Corresponding matching methods have been designed by existing studies for different types of entity data characteristics, such as line and area. However, these approaches are often limited in the generalization ability for matching heterogeneous data from multiple sources and the accuracy for complex pattern matching.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!