How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g., object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2022.3200245 | DOI Listing |
Bioengineering (Basel)
December 2024
Fusion Oriented Research for Disruptive Science and Technology, Japan Science and Technology Agency, 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-8666, Japan.
Mechanical forces influence cellular proliferation, differentiation, tissue morphogenesis, and functional expression within the body. To comprehend the impact of these forces on living organisms, their quantification is essential. This study introduces a novel microdifferential pressure measurement device tailored for cellular-scale pressure assessments.
View Article and Find Full Text PDFFront Neurosci
January 2025
The Basic Department, The Tourism College of Changchun University, Changchun, China.
Introduction: In the field of medical listening assessments,accurate transcription and effective cognitive load management are critical for enhancing healthcare delivery. Traditional speech recognition systems, while successful in general applications often struggle in medical contexts where the cognitive state of the listener plays a significant role. These conventional methods typically rely on audio-only inputs and lack the ability to account for the listener's cognitive load, leading to reduced accuracy and effectiveness in complex medical environments.
View Article and Find Full Text PDFSci Rep
January 2025
Macau University of Science and Technology, Faculty of Innovation Engineering, Macau, 999078, China.
RGGB sensor arrays are commonly used in digital cameras and mobile photography. However, images of extreme dark-light conditions often suffer from insufficient exposure because the sensor receives insufficient light. The existing methods mainly employ U-Net variants, multi-stage camera parameter simulation, or image parameter processing to address this issue.
View Article and Find Full Text PDFCell Struct Funct
January 2025
Department of Pathology and Biology of Diseases, Graduate School of Medicine, Kyoto University.
Live imaging techniques have revolutionized our understanding of paracrine signaling, a crucial form of cell-to-cell communication in biological processes. This review examines recent advances in visualizing and tracking paracrine factors through four key stages: secretion from producing cells, diffusion through extracellular space, binding to target cells, and activation of intracellular signaling within target cells. Paracrine factor secretion can be directly visualized by fluorescent protein tagging to ligand, or indirectly by visualizing the cleavage of the transmembrane pro-ligands or plasma membrane fusion of endosomes comprising the paracrine factors.
View Article and Find Full Text PDFSci Rep
January 2025
Space Science Centre (ANGKASA), Universiti Kebangsaan Malaysia, Bangi, 43600 UKM, Selangor D.E, Malaysia.
It is important in the rising demands to have efficient anomaly detection in camera surveillance systems for improving public safety in a complex environment. Most of the available methods usually fail to capture the long-term temporal dependencies and spatial correlations, especially in dynamic multi-camera settings. Also, many traditional methods rely heavily on large labeled datasets, generalizing poorly when encountering unseen anomalies in the process.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!