Temporal-based Swin Transformer network for workflow recognition of surgical video.

Int J Comput Assist Radiol Surg

Department of General Surgery, Tangdu Hospital, Air Force Medical University, Xiwang, Xi'an, 710038, Shaanxi, China.

Published: January 2023

Purpose: Surgical workflow recognition has emerged as an important part of computer-assisted intervention systems for the modern operating room, which also is a very challenging problem. Although the CNN-based approach achieves excellent performance, it does not learn global and long-range semantic information interactions well due to the inductive bias inherent in convolution.

Methods: In this paper, we propose a temporal-based Swin Transformer network (TSTNet) for the surgical video workflow recognition task. TSTNet contains two main parts: the Swin Transformer and the LSTM. The Swin Transformer incorporates the attention mechanism to encode remote dependencies and learn highly expressive representations. The LSTM is capable of learning long-range dependencies and is used to extract temporal information. The TSTNet organically combines the two components to extract spatiotemporal features that contain more contextual information. In particular, based on a full understanding of the natural features of the surgical video, we propose a priori revision algorithm (PRA) using a priori information about the sequence of the surgical phase. This strategy optimizes the output of TSTNet and further improves the recognition performance.

Results: We conduct extensive experiments using the Cholec80 dataset to validate the effectiveness of the TSTNet-PRA method. Our method achieves excellent performance on the Cholec80 dataset, which accuracy is up to 92.8% and greatly exceeds the state-of-the-art methods.

Conclusion: By modelling remote temporal information and multi-scale visual information, we propose the TSTNet-PRA method. It was evaluated on a large public dataset, showing a high recognition capability superior to other spatiotemporal networks.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s11548-022-02785-yDOI Listing

Publication Analysis

Top Keywords

swin transformer
16
workflow recognition
12
surgical video
12
temporal-based swin
8
transformer network
8
achieves excellent
8
excellent performance
8
cholec80 dataset
8
tstnet-pra method
8
recognition
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!