ShortcutLens: A Visual Analytics Approach for Exploring Shortcuts in Natural Language Understanding Dataset.

Zhihua Jin Xingbo Wang Furui Cheng Chunhui Sun Qun Liu Huamin Qu

IEEE Trans Vis Comput Graph

Published: July 2024

Benchmark datasets play an important role in evaluating Natural Language Understanding (NLU) models. However, shortcuts-unwanted biases in the benchmark datasets-can damage the effectiveness of benchmark datasets in revealing models' real capabilities. Since shortcuts vary in coverage, productivity, and semantic meaning, it is challenging for NLU experts to systematically understand and avoid them when creating benchmark datasets. In this paper, we develop a visual analytics system, ShortcutLens, to help NLU experts explore shortcuts in NLU benchmark datasets. The system allows users to conduct multi-level exploration of shortcuts. Specifically, Statistics View helps users grasp the statistics such as coverage and productivity of shortcuts in the benchmark dataset. Template View employs hierarchical and interpretable templates to summarize different types of shortcuts. Instance View allows users to check the corresponding instances covered by the shortcuts. We conduct case studies and expert interviews to evaluate the effectiveness and usability of the system. The results demonstrate that ShortcutLens supports users in gaining a better understanding of benchmark dataset issues through shortcuts, inspiring them to create challenging and pertinent benchmark datasets.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TVCG.2023.3236380	DOI Listing

Publication Analysis

Top Keywords

benchmark datasets

visual analytics

shortcuts

natural language

language understanding

benchmark

coverage productivity

nlu experts

allows users

benchmark dataset

Similar Publications

TransConv: convolution-infused transformer for protein secondary structure prediction.

J Mol Model

January 2025

National Institute of Technology Durgapur, Durgapur, India.

Sayantan Das Subhayu Ghosh Nanda Dulal Jana

Context: Protein secondary structure prediction is essential for understanding protein function and characteristics and can also facilitate drug discovery. Traditional methods for experimentally determining protein structures are both time-consuming and costly. Computational biology offers a viable alternative by predicting protein structures from their sequences.

View Article and Find Full Text PDF

Similar Publications

Improving Generalizability of Drug-Target Binding Prediction by Pre-trained Multi-view Molecular Representations.

Bioinformatics

January 2025

School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, 130117, Jilin China.

Xike Ouyang Yannuo Feng Chen Cui Yunhe Li Li Zhang

Motivation: Most drugs start on their journey inside the body by binding the right target proteins. This is the reason that numerous efforts have been devoted to predicting the drug-target binding during drug development. However, the inherent diversity among molecular properties, coupled with limited training data availability, poses challenges to the accuracy and generalizability of these methods beyond their training domain.

View Article and Find Full Text PDF

Similar Publications

Link prediction of heterogeneous complex networks based on an improved embedding learning algorithm.

PLoS One

January 2025

School of Foundation Courses, Chongqing Institute of Engineering, Chongqing, China.

Lang Chai Rui Huang

Link prediction in heterogeneous networks is an active research topic in the field of complex network science. Recognizing the limitations of existing methods, which often overlook the varying contributions of different local structures within these networks, this study introduces a novel algorithm named SW-Metapath2vec. This algorithm enhances the embedding learning process by assigning weights to meta-path traces generated through random walks and translates the potential connections between nodes into the cosine similarity of embedded vectors.

View Article and Find Full Text PDF

Similar Publications

Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction.

PLoS Comput Biol

December 2024

School of Biological Sciences (SBS), Nanyang Technological University, Singapore, Singapore.

Akash Bahai Chee Keong Kwoh Yuguang Mu Yinghui Li

The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly.

View Article and Find Full Text PDF

Similar Publications

Benchmarking uncertainty quantification for protein engineering.

PLoS Comput Biol

January 2025

Microsoft Research, Cambridge, Massachusetts, United States of America.

Kevin P Greenman Ava P Amini Kevin K Yang

Machine learning sequence-function models for proteins could enable significant advances in protein engineering, especially when paired with state-of-the-art methods to select new sequences for property optimization and/or model improvement. Such methods (Bayesian optimization and active learning) require calibrated estimations of model uncertainty. While studies have benchmarked a variety of deep learning uncertainty quantification (UQ) methods on standard and molecular machine-learning datasets, it is not clear if these results extend to protein datasets.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!