Temporal-based Swin Transformer network for workflow recognition of surgical video.

Xiaoying Pan Xuanrong Gao Hongyu Wang Wuxia Zhang Yuanzhen Mu Xianli He

Int J Comput Assist Radiol Surg

Department of General Surgery, Tangdu Hospital, Air Force Medical University, Xiwang, Xi'an, 710038, Shaanxi, China.

Published: January 2023

Purpose: Surgical workflow recognition has emerged as an important part of computer-assisted intervention systems for the modern operating room, which also is a very challenging problem. Although the CNN-based approach achieves excellent performance, it does not learn global and long-range semantic information interactions well due to the inductive bias inherent in convolution.

Methods: In this paper, we propose a temporal-based Swin Transformer network (TSTNet) for the surgical video workflow recognition task. TSTNet contains two main parts: the Swin Transformer and the LSTM. The Swin Transformer incorporates the attention mechanism to encode remote dependencies and learn highly expressive representations. The LSTM is capable of learning long-range dependencies and is used to extract temporal information. The TSTNet organically combines the two components to extract spatiotemporal features that contain more contextual information. In particular, based on a full understanding of the natural features of the surgical video, we propose a priori revision algorithm (PRA) using a priori information about the sequence of the surgical phase. This strategy optimizes the output of TSTNet and further improves the recognition performance.

Results: We conduct extensive experiments using the Cholec80 dataset to validate the effectiveness of the TSTNet-PRA method. Our method achieves excellent performance on the Cholec80 dataset, which accuracy is up to 92.8% and greatly exceeds the state-of-the-art methods.

Conclusion: By modelling remote temporal information and multi-scale visual information, we propose the TSTNet-PRA method. It was evaluated on a large public dataset, showing a high recognition capability superior to other spatiotemporal networks.

Download full-text PDF	Source
http://dx.doi.org/10.1007/s11548-022-02785-y	DOI Listing

Publication Analysis

Top Keywords

swin transformer

workflow recognition

surgical video

temporal-based swin

transformer network

achieves excellent

excellent performance

cholec80 dataset

tstnet-pra method

recognition

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!