Deep reinforcement learning (RL) agents often suffer from catastrophic forgetting, forgetting previously found solutions in parts of the input space when training new data. Replay memories are a common solution to the problem by decorrelating and shuffling old and new training samples. They naively store state transitions as they arrive, without regard for redundancy.
View Article and Find Full Text PDFTo enhance human-robot social interaction, it is essential for robots to process multiple social cues in a complex real-world environment. However, incongruency of input information across modalities is inevitable and could be challenging for robots to process. To tackle this challenge, our study adopted the neurorobotic paradigm of crossmodal conflict resolution to make a robot express human-like social attention.
View Article and Find Full Text PDFDue to the dynamic nature of human language, automatic speech recognition (ASR) systems need to continuously acquire new vocabulary. Out-Of-Vocabulary (OOV) words, such as trending words and new named entities, pose problems to modern ASR systems that require long training times to adapt their large numbers of parameters. Different from most previous research focusing on language model post-processing, we tackle this problem on an earlier processing level and eliminate the bias in acoustic modeling to recognize OOV words acoustically.
View Article and Find Full Text PDFWe propose a neural learning approach for a humanoid exercise robot that can automatically analyze and correct physical exercises. Such an exercise robot should be able to train many different human partners over time and thus requires the ability for lifelong learning. To this end, we develop a modified Grow-When-Required (GWR) network with recurrent connections, episodic memory and a novel subnode mechanism for learning spatiotemporal relationships of body movements and poses.
View Article and Find Full Text PDFA cognitive agent performing in the real world needs to learn relevant concepts about its environment (e.g., objects, color, and shapes) and react accordingly.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
February 2024
The aim of this work is to investigate the impact of crossmodal self-supervised pre-training for speech reconstruction (video-to-audio) by leveraging the natural co-occurrence of audio and visual streams in videos. We propose LipSound2 that consists of an encoder-decoder architecture and location-aware attention mechanism to map face image sequences to mel-scale spectrograms directly without requiring any human annotations. The proposed LipSound2 model is first pre-trained on ∼ 2400 -h multilingual (e.
View Article and Find Full Text PDFCollaborative interactions require social robots to share the users' perspective on the interactions and adapt to the dynamics of their affective behaviour. Yet, current approaches for affective behaviour generation in robots focus on instantaneous perception to generate a one-to-one mapping between observed human expressions and static robot actions. In this paper, we propose a novel framework for affect-driven behaviour generation in social robots.
View Article and Find Full Text PDFHuman language is inherently embodied and grounded in sensorimotor representations of the self and the world around it. This suggests that the body schema and ideomotor action-effect associations play an important role in language understanding, language generation, and verbal/physical interaction with others. There are computational models that focus purely on non-verbal interaction between humans and robots, and there are computational models for dialog systems that focus only on verbal interaction.
View Article and Find Full Text PDFLong-term human-robot interaction requires the continuous acquisition of knowledge. This ability is referred to as lifelong learning (LL). LL is a long-standing challenge in machine learning due to catastrophic forgetting, which states that continuously learning from novel experiences leads to a decrease in the performance of previously acquired knowledge.
View Article and Find Full Text PDFAs robots become more advanced and capable, developing trust is an important factor of human-robot interaction and cooperation. However, as multiple environmental and social factors can influence trust, it is important to develop more elaborate scenarios and methods to measure human-robot trust. A widely used measurement of trust in social science is the .
View Article and Find Full Text PDFHuman infants are able to acquire natural language seemingly easily at an early age. Their language learning seems to occur simultaneously with learning other cognitive functions as well as with playful interactions with the environment and caregivers. From a neuroscientific perspective, natural language is embodied, grounded in most, if not all, sensory and sensorimotor modalities, and acquired by means of crossmodal integration.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
March 2022
Generative adversarial networks conditioned on textual image descriptions are capable of generating realistic-looking images. However, current methods still struggle to generate images based on complex image captions from a heterogeneous domain. Furthermore, quantitatively evaluating these text-to-image models is challenging, as most evaluation metrics only judge image quality but not the conformity between the image and its caption.
View Article and Find Full Text PDFTo overcome novel challenges in complex domestic environments, humanoid robots can learn from human teachers. We propose that the capability for social interaction should be a key factor in this teaching process and benefits both the subjective experience of the human user and the learning process itself. To support our hypothesis, we present a Human-Robot Interaction study on human-assisted visuomotor learning with the robot NICO, the Neuro-Inspired COmpanion, a child-sized humanoid.
View Article and Find Full Text PDFSelective attention plays an essential role in information acquisition and utilization from the environment. In the past 50 years, research on selective attention has been a central topic in cognitive science. Compared with unimodal studies, crossmodal studies are more complex but necessary to solve real-world challenges in both human experiments and computational modeling.
View Article and Find Full Text PDFExpectation learning is a unsupervised learning process which uses multisensory bindings to enhance unisensory perception. For instance, as humans, we learn to associate a barking sound with the visual appearance of a dog, and we continuously fine-tune this association over time, as we learn, e.g.
View Article and Find Full Text PDFThe problem of generating structured Knowledge Graphs (KGs) is difficult and open but relevant to a range of tasks related to decision making and information augmentation. A promising approach is to study generating KGs as a relational representation of inputs (e.g.
View Article and Find Full Text PDFReinforcement learning is generally accepted to be an appropriate and successful method to learn robot control. Symbolic action planning is useful to resolve causal dependencies and to break a causally complex problem down into a sequence of simpler high-level actions. A problem with the integration of both approaches is that action planning is based on , whereas reinforcement learning is usually driven by a function.
View Article and Find Full Text PDFAn important step for assistive systems and robot companions operating in human environments is to learn the compositionality of human activities, i.e., recognize both activities and their comprising actions.
View Article and Find Full Text PDFHumans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for computational learning systems and autonomous agents interacting in the real world and processing continuous streams of information.
View Article and Find Full Text PDFInt J Neural Syst
June 2019
This paper attempts to solve the typical problems of self-organizing growing network models, i.e. (a) an influence of the order of input data on the self-organizing ability, (b) an instability to high-dimensional data and an excessive sensitivity to noise, and (c) an expensive computational cost by integrating Kernel Bayes Rule (KBR) and Correntropy-Induced Metric (CIM) into Adaptive Resonance Theory (ART) framework.
View Article and Find Full Text PDFArtificial autonomous agents and robots interacting in complex environments are required to continually acquire and fine-tune knowledge over sustained periods of time. The ability to learn from continuous streams of information is referred to as and represents a long-standing challenge for neural network models due to in which novel sensory experience interferes with existing representations and leads to abrupt decreases in the performance on previously acquired knowledge. Computational models of lifelong learning typically alleviate catastrophic forgetting in experimental scenarios with given datasets of static images and limited complexity, thereby differing significantly from the conditions artificial agents are exposed to.
View Article and Find Full Text PDF