Introduction: Recognizing human actions is crucial for allowing machines to understand and recognize human behavior, with applications spanning video based surveillance systems, human-robot collaboration, sports analysis systems, and entertainment. The immense diversity in human movement and appearance poses a significant challenge in this field, especially when dealing with drone-recorded (RGB) videos. Factors such as dynamic backgrounds, motion blur, occlusions, varying video capture angles, and exposure issues greatly complicate recognition tasks.

Methods: In this study, we suggest a method that addresses these challenges in RGB videos captured by drones. Our approach begins by segmenting the video into individual frames, followed by preprocessing steps applied to these RGB frames. The preprocessing aims to reduce computational costs, optimize image quality, and enhance foreground objects while removing the background.

Result: This results in improved visibility of foreground objects while eliminating background noise. Next, we employ the YOLOv9 detection algorithm to identify human bodies within the images. From the grayscale silhouette, we extract the human skeleton and identify 15 important locations, such as the head, neck, shoulders (left and right), elbows, wrists, hips, knees, ankles, and hips (left and right), and belly button. By using all these points, we extract specific positions, angular and distance relationships between them, as well as 3D point clouds and fiducial points. Subsequently, we optimize this data using the kernel discriminant analysis (KDA) optimizer, followed by classification using a deep neural network (CNN). To validate our system, we conducted experiments on three benchmark datasets: UAV-Human, UCF, and Drone-Action.

Discussion: On these datasets, our suggested model produced corresponding action recognition accuracies of 0.68, 0.75, and 0.83.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652500PMC
http://dx.doi.org/10.3389/fnbot.2024.1443678DOI Listing

Publication Analysis

Top Keywords

rgb videos
8
frames preprocessing
8
foreground objects
8
human
6
unmanned aerial
4
aerial vehicles
4
vehicles human
4
human detection
4
detection recognition
4
recognition neural-network
4

Similar Publications

Background: Falls are among the most prevalent workplace accidents, necessitating thorough screening for susceptibility to falls and customization of individualized fall prevention programs. The aim of this study was to develop and validate a high fall risk prediction model using machine learning (ML) and video-based first three steps in middle-aged workers.

Methods: Train data (n=190, age 54.

View Article and Find Full Text PDF

In industrial robotic arm gripping operations within disordered environments, the loss of physical information on the object's surface is often caused by changes such as varying lighting conditions, weak surface textures, and sensor noise. This leads to inaccurate object detection and pose estimation information. A method for industrial object pose estimation using point cloud data is proposed to improve pose estimation accuracy.

View Article and Find Full Text PDF

Steganography is used to hide sensitive types of data including images, audio, text, and videos in an invisible way so that no one can detect it. Image-based steganography is a technique that uses images as a cover media for hiding and transmitting sensitive information over the internet. However, image-based steganography is a challenging task due to transparency, security, computational efficiency, tamper protection, payload, etc.

View Article and Find Full Text PDF

This dataset contains demographic, morphological and pathological data, endoscopic images and videos of 191 patients with colorectal polyps. Morphological data is included based on the latest international gastroenterology classification references such as Paris, Pit and JNET classification. Pathological data includes the diagnosis of the polyps including Tubular, Villous, Tubulovillous, Hyperplastic, Serrated, Inflammatory and Adenocarcinoma with Dysplasia Grade & Differentiation.

View Article and Find Full Text PDF

Photo- and video-based reidentification of green sea turtles using their natural markers is far less invasive than artificial tagging. An RGB camera mounted on a man-portable rig, was used to collect video data on Greater Talang Island (1 °54'45″N 109 °46'33″E) from September to October 2022, and September 2023. This islet is located 30 minutes offshore from the Sematan district in Southwest Sarawak, Malaysia.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!