An Integrated Pipeline for Phenotypic Characterization, Clustering and Visualization of Patient Cohorts in a Rare Disease-Oriented Clinical Data Warehouse.

Xiaoyi Chen Junyuan Wang Carole Faviez Xiaomeng Wang Marc Vincent Rosy Tsopra Anita Burgun Nicolas Garcelon

Stud Health Technol Inform

Data Science Platform, Imagine Institute, Université Paris Cité, Inserm UMR 1163, Paris, France.

Published: August 2024

Rare diseases pose significant challenges due to their heterogeneity and lack of knowledge. This study develops a comprehensive pipeline interoperable with a document-oriented clinical data warehouse, integrating cohort characterization, patient clustering and interpretation. Leveraging NLP, semantic similarity, machine learning and visualization, the pipeline enables the identification of prevalent phenotype patterns and patient stratification. To enhance interpretability, discriminant phenotypes characterizing each cluster are provided. Users can visually test hypotheses by marking patients exhibiting specific keywords in the EHR like genes, drugs and procedures. Implemented through a web interface, the pipeline enables clinicians to navigate through different modules, discover intricate patterns and generate interpretable insights that may advance rare diseases understanding, guide decision-making, and ultimately improve patient outcomes.

Download full-text PDF	Source
http://dx.doi.org/10.3233/SHTI240777	DOI Listing

Publication Analysis

Top Keywords

clinical data

data warehouse

rare diseases

pipeline enables

integrated pipeline

pipeline phenotypic

phenotypic characterization

characterization clustering

clustering visualization

patient

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!