How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g., object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2022.3200245DOI Listing

Publication Analysis

Top Keywords

sensor fusion
8
geometry-based fusion
8
driving
5
transfuser
4
transfuser imitation
4
imitation transformer-based
4
transformer-based sensor
4
fusion
4
fusion autonomous
4
autonomous driving
4

Similar Publications

Microdifferential Pressure Measurement Device for Cellular Microenvironments.

Bioengineering (Basel)

December 2024

Fusion Oriented Research for Disruptive Science and Technology, Japan Science and Technology Agency, 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-8666, Japan.

Mechanical forces influence cellular proliferation, differentiation, tissue morphogenesis, and functional expression within the body. To comprehend the impact of these forces on living organisms, their quantification is essential. This study introduces a novel microdifferential pressure measurement device tailored for cellular-scale pressure assessments.

View Article and Find Full Text PDF

Introduction: In the field of medical listening assessments,accurate transcription and effective cognitive load management are critical for enhancing healthcare delivery. Traditional speech recognition systems, while successful in general applications often struggle in medical contexts where the cognitive state of the listener plays a significant role. These conventional methods typically rely on audio-only inputs and lack the ability to account for the listener's cognitive load, leading to reduced accuracy and effectiveness in complex medical environments.

View Article and Find Full Text PDF

RGGB sensor arrays are commonly used in digital cameras and mobile photography. However, images of extreme dark-light conditions often suffer from insufficient exposure because the sensor receives insufficient light. The existing methods mainly employ U-Net variants, multi-stage camera parameter simulation, or image parameter processing to address this issue.

View Article and Find Full Text PDF

Live imaging of paracrine signaling: Advances in visualization and tracking techniques.

Cell Struct Funct

January 2025

Department of Pathology and Biology of Diseases, Graduate School of Medicine, Kyoto University.

Live imaging techniques have revolutionized our understanding of paracrine signaling, a crucial form of cell-to-cell communication in biological processes. This review examines recent advances in visualizing and tracking paracrine factors through four key stages: secretion from producing cells, diffusion through extracellular space, binding to target cells, and activation of intracellular signaling within target cells. Paracrine factor secretion can be directly visualized by fluorescent protein tagging to ligand, or indirectly by visualizing the cleavage of the transmembrane pro-ligands or plasma membrane fusion of endosomes comprising the paracrine factors.

View Article and Find Full Text PDF

It is important in the rising demands to have efficient anomaly detection in camera surveillance systems for improving public safety in a complex environment. Most of the available methods usually fail to capture the long-term temporal dependencies and spatial correlations, especially in dynamic multi-camera settings. Also, many traditional methods rely heavily on large labeled datasets, generalizing poorly when encountering unseen anomalies in the process.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!