Interpretability Is in the Mind of the Beholder: A Causal Framework for Human-Interpretable Representation Learning.

Entropy (Basel)

Dipartimento di Ingegneria e Scienza dell'Informazione, University of Trento, 38123 Trento, Italy.

Published: November 2023

Research on Explainable Artificial Intelligence has recently started exploring the idea of producing explanations that, rather than being expressed in terms of low-level features, are encoded in terms of . How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post hoc explainers and neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: . The key challenge in human-interpretable representation learning (hrl) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring suitable for both post hoc explainers and concept-based neural networks. Our formalization of hrl builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us derive a principled notion of between the machine's representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive game, and clarify the relationship between alignment and a well-known property of representations, namely . We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as , and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10742865PMC
http://dx.doi.org/10.3390/e25121574DOI Listing

Publication Analysis

Top Keywords

representation learning
12
human-interpretable representation
8
post hoc
8
hoc explainers
8
neural networks
8
human
5
interpretability
4
interpretability mind
4
mind beholder
4
beholder causal
4

Similar Publications

Pulse approach: a physics-guided machine learning model for thermal analysis in laser-based powder bed fusion of metals.

Prog Addit Manuf

July 2024

Empa Swiss Federal Laboratories for Materials Science and Technology, Überlandstrasse 129, 8600 Dübendorf, Switzerland.

Fast and accurate representation of heat transfer in laser powder-bed fusion of metals (PBF-LB/M) is essential for thermo-mechanical analyses. As an example, it benefits the detection of thermal hotspots at the design stage. While traditional physics-based numerical approaches such as the finite element (FE) method are applicable to a wide variety of problems, they are computationally too expensive for PBF-LB/M due to the space- and time-discretization requirements.

View Article and Find Full Text PDF

Introduction: The study of attention has been pivotal in advancing our comprehension of cognition. The goal of this study is to investigate which EEG data representations or features are most closely linked to attention, and to what extent they can handle the cross-subject variability.

Methods: We explore the features obtained from the univariate time series from a single EEG channel, such as time domain features and recurrence plots, as well as representations obtained directly from the multivariate time series, such as global field power or functional brain networks.

View Article and Find Full Text PDF

A variational graph-partitioning approach to modeling protein liquid-liquid phase separation.

Cell Rep Phys Sci

November 2024

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.

Graph neural networks (GNNs) have emerged as powerful tools for representation learning. Their efficacy depends on their having an optimal underlying graph. In many cases, the most relevant information comes from specific subgraphs.

View Article and Find Full Text PDF

Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem.

View Article and Find Full Text PDF

Accurate drug-target binding affinity (DTA) prediction is crucial in drug discovery. Recently, deep learning methods for DTA prediction have made significant progress. However, there are still two challenges: (1) recent models always ignore the correlations in drug and target data in the drug/target representation process and (2) the interaction learning of drug-target pairs always is by simple concatenation, which is insufficient to explore their fusion.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!