Publications by authors named "Andreas Kerren"

Visualization for explainable and trustworthy machine learning remains one of the most important and heavily researched fields within information visualization and visual analytics with various application domains, such as medicine, finance, and bioinformatics. After our 2020 state-of-the-art report comprising 200 techniques, we have persistently collected peer-reviewed articles describing visualization techniques, categorized them based on the previously established categorization schema consisting of 119 categories, and provided the resulting collection of 542 techniques in an online survey browser. In this survey article, we present the updated findings of new analyses of this dataset as of fall 2023 and discuss trends, insights, and eight open challenges for using visualizations in machine learning.

View Article and Find Full Text PDF

Relational information between different types of entities is often modelled by a multilayer network (MLN) - a network with subnetworks represented by layers. The layers of an MLN can be arranged in different ways in a visual representation, however, the impact of the arrangement on the readability of the network is an open question. Therefore, we studied this impact for several commonly occurring tasks related to MLN analysis.

View Article and Find Full Text PDF

The machine learning (ML) life cycle involves a series of iterative steps, from the effective gathering and preparation of the data-including complex feature engineering processes-to the presentation and improvement of results, with various algorithms to choose from in every step. Feature engineering in particular can be very beneficial for ML, leading to numerous improvements such as boosting the predictive results, decreasing computational times, reducing excessive noise, and increasing the transparency behind the decisions taken during the training. Despite that, while several visual analytics tools exist to monitor and control the different stages of the ML life cycle (especially those related to data and algorithms), feature engineering support remains inadequate.

View Article and Find Full Text PDF

In machine learning (ML), ensemble methods-such as bagging, boosting, and stacking-are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process.

View Article and Find Full Text PDF

t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction.

View Article and Find Full Text PDF

Dimensionality reduction methods, also known as projections, are frequently used in multidimensional data exploration in machine learning, data science, and information visualization. Tens of such techniques have been proposed, aiming to address a wide set of requirements, such as ability to show the high-dimensional data structure, distance or neighborhood preservation, computational scalability, stability to data noise and/or outliers, and practical ease of use. However, it is far from clear for practitioners how to choose the best technique for a given use context.

View Article and Find Full Text PDF

Computer-assisted text coding can facilitate the analysis of large text collections. To evaluate the functionality of providing an analyst with a ranked list of suggestions for suitable text codes, we used a data set of discussion posts, which had been manually coded for reasons given for taking a stance on the topic of vaccination. We trained a logistic regression classifier to rank these reasons according to the probability that they would be present in the post.

View Article and Find Full Text PDF

Arguments used when vaccination is debated on Internet discussion forums might give us valuable insights into reasons behind vaccine hesitancy. In this study, we applied automatic topic modelling on a collection of 943 discussion posts in which vaccine was debated, and six distinct discussion topics were detected by the algorithm. When manually coding the posts ranked as most typical for these six topics, a set of semantically coherent arguments were identified for each extracted topic.

View Article and Find Full Text PDF

Data visualization is of increasing importance in the Biosciences. During the past 15 years, a great number of novel methods and tools for the visualization of biological data have been developed and published in various journals and conference proceedings. As a consequence, keeping an overview of state-of-the-art visualization research has become increasingly challenging for both biology researchers and visualization researchers.

View Article and Find Full Text PDF

Online social media are a perfect text source for stance analysis. Stance in human communication is concerned with speaker attitudes, beliefs, feelings and opinions. Expressions of stance are associated with the speakers' view of what they are talking about and what is up for discussion and negotiation in the intersubjective exchange.

View Article and Find Full Text PDF

Learning more about people mobility is an important task for official decision makers and urban planners. Mobility data sets characterize the variation of the presence of people in different places over time as well as movements (or flows) of people between the places. The analysis of mobility data is challenging due to the need to analyze and compare spatial situations (i.

View Article and Find Full Text PDF

The more-or-less artificial barrier between information visualization and scientific visualization hinders knowledge discovery. Having an integrated view of many aspects of the target data, including a seamlessly interwoven visual display of structural abstract data and 3D spatial information, could lead to new discoveries, insights, and scientific questions. Such a view also could reduce the user's cognitive load--that is, reduce the effort the user expends when comparing views.

View Article and Find Full Text PDF

Linnaeus University offers two master's courses in information visualization for computer science students with programming experience. This article briefly describes the syllabi, exercises, and practices developed for these courses.

View Article and Find Full Text PDF

Interaction between particles in so-called granular media, such as soil and sand, plays an important role in the context of geomechanical phenomena and numerous industrial applications. A two scale homogenization approach based on a micro and a macro scale level is briefly introduced in this paper. Computation of granular material in such a way gives a deeper insight into the context of discontinuous materials and at the same time reduces the computational costs.

View Article and Find Full Text PDF