A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 1034
Function: getPubMedXML

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3152
Function: GetPubMedArticleOutput_2016

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

Self-Supervised 3-D Semantic Representation Learning for Vision-and-Language Navigation. | LitMetric

In vision-and-language navigation (VLN) tasks, most current methods primarily utilize RGB images, overlooking the rich 3-D semantic data inherent to environments. To rectify this, we introduce a novel VLN framework that integrates 3-D semantic information into the navigation process. Our approach features a self-supervised training scheme that incorporates voxel-level 3-D semantic reconstruction to create a detailed 3-D semantic representation. A key component of this framework is a pretext task focused on region queries, which determines the presence of objects in specific 3-D areas. Following this, we devise an long short-term memory (LSTM)-based navigation model that is trained using our 3-D semantic representations. To maximize the utility of these 3-D semantic representations, we implement a cross-modal distillation strategy. This strategy encourages the RGB model's outputs to emulate those from the 3-D semantic feature network, enabling the concurrent training of both branches to merge RGB and 3-D semantic data effectively. Comprehensive evaluations on both the R2R and R4R datasets reveal that our method significantly enhances performance in VLN tasks.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2024.3395633DOI Listing

Publication Analysis

Top Keywords

3-d semantic
36
semantic
9
3-d
9
semantic representation
8
vision-and-language navigation
8
vln tasks
8
semantic data
8
semantic representations
8
self-supervised 3-d
4
representation learning
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!