AI Article Synopsis

  • Learning to adapt to different goals in visual navigation is difficult, so we propose a model-embedded actor-critic architecture for multigoal navigation.
  • We introduce two new designs: the inverse dynamics model (InvDM) to connect states with goals and improve learning efficiency, and multigoal colearning (MgCl) to enhance sample efficiency and utilize unintended positive experiences.
  • Our enhanced navigation model includes two self-supervised tasks: path closed-loop detection to recognize experienced states, and state-target matching to assess state-goal differences, resulting in faster convergence and better generalization than existing methods.

Article Abstract

Learning to adapt to a series of different goals in visual navigation is challenging. In this work, we present a model-embedded actor-critic architecture for the multigoal visual navigation task. To enhance the task cooperation in multigoal learning, we introduce two new designs to the reinforcement learning scheme: inverse dynamics model (InvDM) and multigoal colearning (MgCl). Specifically, InvDM is proposed to capture the navigation-relevant association between state and goal and provide additional training signals to relieve the sparse reward issue. MgCl aims at improving the sample efficiency and supports the agent to learn from unintentional positive experiences. Besides, to further improve the scene generalization capability of the agent, we present an enhanced navigation model that consists of two self-supervised auxiliary task modules. The first module, which is named path closed-loop detection, helps to understand whether the state has been experienced. The second one, namely the state-target matching module, tries to figure out the difference between state and goal. Extensive results on the interactive platform AI2-THOR demonstrate that the agent trained with the proposed method converges faster than state-of-the-art methods while owning good generalization capability. The video demonstration is available at https://vsislab.github.io/mgvn.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2021.3057424DOI Listing

Publication Analysis

Top Keywords

visual navigation
12
reinforcement learning
8
state goal
8
generalization capability
8
navigation multiple
4
multiple goals
4
goals based
4
based deep
4
deep reinforcement
4
learning
4

Similar Publications

The mobility of people with severe visual impairment is limited affecting their comfort and productivity. There are about 45 million people who are blind with global financial burden and annual global cost of productivity estimated to be USD411 billion according to World Health Organization report of 2024. The contributions of the people who are visually impaired to the gross domestic product (GDP) can be enhanced deploying technology.

View Article and Find Full Text PDF

PALMS: Plane-based Accessible Indoor Localization Using Mobile Smartphones.

Int Conf Indoor Position Indoor Navig

October 2024

Computer Science and Engineering, University of California, Santa Cruz, Santa Cruz, United States.

In this paper, we present PALMS, an innovative indoor global localization and relocalization system for mobile smartphones that utilizes publicly available floor plans. Unlike most vision-based methods that require constant visual input, our system adopts a dynamic form of localization that considers a single instantaneous observation and odometry data. The core contribution of this work is the introduction of a particle filter initialization method that leverages the Certainly Empty Space (CES) constraint along with principal orientation matching.

View Article and Find Full Text PDF

Robust Indoor Pedestrian Backtracking Using Magnetic Signatures and Inertial Data.

Int Conf Indoor Position Indoor Navig

October 2024

Department of Computer Science & Engineering, University of California, Santa Cruz, Santa Cruz, USA.

Navigating unfamiliar environments can be challenging for visually impaired individuals due to difficulties in recognizing distant landmarks or visual cues. This work focuses on a particular form of wayfinding, specifically backtracking a previously taken path, which can be useful for blind pedestrians. We propose a hands-free indoor navigation solution using a smartphone without relying on pre-existing maps or external infrastructure.

View Article and Find Full Text PDF

Reading, face recognition, and navigation are supported by visuospatial computations in category-selective regions across ventral, lateral, and dorsal visual streams. However, the nature of visuospatial computations across streams and their development in adolescence remain unknown. Using fMRI and population receptive field (pRF) modeling in adolescents and adults, we estimate pRFs in high-level visual cortex and determine their development.

View Article and Find Full Text PDF

Background: Pain of a chronic nature remains the foremost concern in tertiary spine clinics, yet its elusive nature and quantification challenges persist. Despite extensive research and education on low back pain (LBP), the realm of diagnostic practices lacks a unified approach. Clinically, LBP exhibits a multifaceted character, encompassing conventional assessments of severity and disability, alongside nuanced attributes like pain characterization, duration, and patient expectations.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!