The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code is available at https://github.com/CAMMA-public/SelfSupSurg.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.media.2023.102844 | DOI Listing |
Sensors (Basel)
December 2024
Department of Electromechanical Engineering, University of Beira Interior, Rua Marquês d'Ávila e Bolama, 6201-001 Covilhã, Portugal.
This article presents the development of a resistive frost-detection sensor fabricated using Fused Filament Fabrication (FFF) with a conductive filament. This sensor was designed to enhance demand-defrost control in industrial refrigeration systems. Frost accumulation on evaporator surfaces blocks airflow and creates a thermal insulating barrier that reduces heat exchange efficiency, increasing energy consumption and operational costs.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Mechanical Engineering & Automation, Northeastern University, Shenyang 110819, China.
RGB-T salient object detection (SOD) has received considerable attention in the field of computer vision. Although existing methods have achieved notable detection performance in certain scenarios, challenges remain. Many methods fail to fully utilize high-frequency and low-frequency features during information interaction among different scale features, limiting detection performance.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China.
Human pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crowds, and it is still difficult to detect small targets. In this paper, we propose a Mamba-based human pose estimation.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Biological and Environmental Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK.
Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Integrating vision-language models into these workflows could address this gap by providing enhanced contextual understanding and enabling advanced queries across temporal and spatial dimensions. Here, we present an integrated approach that combines deep learning-based vision and language models to improve ecological reporting using data from camera traps.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Software, Kwangwoon University, Kwangwoon-ro 20, Nowon-gu, Seoul 01897, Republic of Korea.
Object tracking is a challenging task in computer vision. While simple tracking methods offer fast speeds, they often fail to track targets. To address this issue, traditional methods typically rely on complex algorithms.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!