Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their performance on Video Object Detection (VOD) has not been well explored. In this paper, we present TransVOD, the first end-to-end video object detection system based on simple yet effective spatial-temporal Transformer architectures. The first goal of this paper is to streamline the pipeline of current VOD, effectively removing the need for many hand-crafted components for feature aggregation, e.g., optical flow model, relation networks. Besides, benefited from the object query design in DETR, our method does not need post-processing methods such as Seq-NMS. In particular, we present a temporal Transformer to aggregate both the spatial object queries and the feature memories of each frame. Our temporal transformer consists of two components: Temporal Query Encoder (TQE) to fuse object queries, and Temporal Deformable Transformer Decoder (TDTD) to obtain current frame detection results. These designs boost the strong baseline deformable DETR by a significant margin (3 %-4 % mAP) on the ImageNet VID dataset. TransVOD yields comparable performances on the benchmark of ImageNet VID. Then, we present two improved versions of TransVOD including TransVOD++ and TransVOD Lite. The former fuses object-level information into object query via dynamic convolution while the latter models the entire video clips as the output to speed up the inference time. We give detailed analysis of all three models in the experiment part. In particular, our proposed TransVOD++ sets a new state-of-the-art record in terms of accuracy on ImageNet VID with 90.0 % mAP. Our proposed TransVOD Lite also achieves the best speed and accuracy trade-off with 83.7 % mAP while running at around 30 FPS on a single V100 GPU device. Code and models are available at https://github.com/SJTU-LuHe/TransVOD.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2022.3223955 | DOI Listing |
Data Brief
February 2025
Department of Electrical and Computer Engineering, University of Michigan-Dearborn, 4901 Evergreen Rd, Dearborn, 48128 MI, USA.
In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories.
View Article and Find Full Text PDFHeliyon
January 2025
College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, 325035, China.
In the context of graduate learning in China, mentors are the teachers with the highest frequency of contact and the closest relationships with postgraduate students. Nevertheless, a number of issues pertaining to the relationship between mentors and postgraduate students have emerged with increasing frequency in recent years, resulting in a notable decline in the quality of graduate education. In this paper, we investigate the influence of the relationship between mentors and postgraduate students on the postgraduate learning performance, with postgraduate students' admission motivation and learning pressure acting as moderating variables.
View Article and Find Full Text PDFGenes Brain Behav
February 2025
Département de Readaptation et gériatrie, University of Geneva, Geneva, Switzerland.
Human microbiota-associated murine models, using fecal microbiota transplantation (FMT) from human donors, help explore the microbiome's role in diseases like Alzheimer's disease (AD). This study examines how gut bacteria from donors with protective factors against AD influence behavior and brain pathology in an AD mouse model. Female 3xTgAD mice received weekly FMT for 2 months from (i) an 80-year-old AD patient (AD-FMT), (ii) a cognitively healthy 73-year-old with the protective APOEe2 allele (APOEe2-FMT), (iii) a 22-year-old healthy donor (Young-FMT), and (iv) untreated mice (Mice-FMT).
View Article and Find Full Text PDFAppl Radiat Isot
January 2025
Tokyo City University, 1-28-1, Tamazutsumi, Setagaya-ku, Tokyo, 158-8557, Japan.
In clearance measurements involving a single material type, a conversion factor was applied to convert measurement results to activity based on an assumed uniform density. However, this factor has been found to underestimate activity in material mixtures. In this study, we proposed a method to identify the location with the lowest detection sensitivity (minimum location) in a mixture and evaluated its applicability to the conversion factor.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Instituto de Telecomunicações (IT), Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal.
Shrimp farming is a growing industry, and automating certain processes within aquaculture tanks is becoming increasingly important to improve efficiency. This paper proposes an image-based system designed to address four key tasks in an aquaculture tank with : estimating shrimp length and weight, counting shrimps, and evaluating feed pellet food attractiveness. A setup was designed, including a camera connected to a Raspberry Pi computer, to capture high-quality images around a feeding plate during feeding moments.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!