Continual Learning (CL) is recognized to be a storage-efficient and privacy-protecting approach for learning from sequentially-arriving medical sites. However, most existing CL methods assume that each site is fully labeled, which is impractical due to budget and expertise constraint. This paper studies the Semi-Supervised Continual Learning (SSCL) that adopts partially-labeled sites arriving over time, with each site delivering only limited labeled data while the majority remains unlabeled.
View Article and Find Full Text PDFSci Bull (Beijing)
September 2024
In medical image segmentation, it is often necessary to collect opinions from multiple experts to make the final decision. This clinical routine helps to mitigate individual bias. However, when data is annotated by multiple experts, standard deep learning models are often not applicable.
View Article and Find Full Text PDFIEEE Trans Med Imaging
August 2024
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. This pipeline may potentially compromise the flexibility of learning multimodal representations, consequently constraining the overall effectiveness.
View Article and Find Full Text PDFIEEE Trans Image Process
August 2024
In this study, we propose a novel approach for RGB-D salient instance segmentation using a dual-branch cross-modal feature calibration architecture called CalibNet. Our method simultaneously calibrates depth and RGB features in the kernel and mask branches to generate instance-aware kernels and mask features. CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF), which work together to generate effective instance-aware kernels and integrate cross-modal features.
View Article and Find Full Text PDFSurgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.
View Article and Find Full Text PDFColorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform.
View Article and Find Full Text PDFIEEE Trans Med Imaging
September 2024
Recent advancements in artificial intelligence have witnessed human-level performance; however, AI-enabled cognitive assistance for therapeutic procedures has not been fully explored nor pre-clinically validated. Here we propose AI-Endo, an intelligent surgical workflow recognition suit, for endoscopic submucosal dissection (ESD). Our AI-Endo is trained on high-quality ESD cases from an expert endoscopist, covering a decade time expansion and consisting of 201,026 labeled frames.
View Article and Find Full Text PDFThis paper introduces the "SurgT: Surgical Tracking" challenge which was organized in conjunction with the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2022). There were two purposes for the creation of this challenge: (1) the establishment of the first standardized benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated data in surgery. A dataset of 157 stereo endoscopic videos from 20 clinical cases, along with stereo camera calibration parameters, have been provided.
View Article and Find Full Text PDFIEEE Trans Med Imaging
January 2024
Personalized federated learning (PFL) addresses the data heterogeneity challenge faced by general federated learning (GFL). Rather than learning a single global model, with PFL a collection of models are adapted to the unique feature distribution of each site. However, current PFL methods rarely consider self-attention networks which can handle data heterogeneity by long-range dependency modeling and they do not utilize prediction inconsistencies in local models as an indicator of site uniqueness.
View Article and Find Full Text PDFPurpose: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset.
View Article and Find Full Text PDFInt J Comput Assist Radiol Surg
December 2022
Purpose: Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial-temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects.
View Article and Find Full Text PDFIEEE Trans Med Imaging
November 2022
Automatic surgical scene segmentation is fundamental for facilitating cognitive intelligence in the modern operating theatre. Previous works rely on conventional aggregation modules (e.g.
View Article and Find Full Text PDFIn this paper, we propose a novel method of Unsupervised Disentanglement of Scene and Motion (UDSM) representations for minimally invasive surgery video retrieval within large databases, which has the potential to advance intelligent and efficient surgical teaching systems. To extract more discriminative video representations, two designed encoders with a triplet ranking loss and an adversarial learning mechanism are established to respectively capture the spatial and temporal information for achieving disentangled features from each frame with promising interpretability. In addition, the long-range temporal dependencies are improved in an integrated video level using a temporal aggregation module and then a set of compact binary codes that carries representative features is yielded to realize fast retrieval.
View Article and Find Full Text PDFWe propose a novel shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection (ESD) surgery. This task is of great clinical significance but extremely challenging due to bleeding, lighting reflection, and motion blur in the complicated surgical environment. Compared with existing solutions, which either neglect geometric relationships among targeting objects or capture the relationships by using complicated aggregation schemes, the proposed network is capable of achieving satisfactory accuracy while maintaining real-time performance by taking full advantage of the spatial relations among landmarks.
View Article and Find Full Text PDFIEEE Trans Med Imaging
March 2022
Multimodal learning usually requires a complete set of modalities during inference to maintain performance. Although training data can be well-prepared with high-quality multiple modalities, in many cases of clinical practice, only one modality can be acquired and important clinical evaluations have to be made based on the limited single modality information. In this work, we propose a privileged knowledge learning framework with the 'Teacher-Student' architecture, in which the complete multimodal knowledge that is only available in the training data (called privileged information) is transferred from a multimodal teacher network to a unimodal student network, via both a pixel-level and an image-level distillation scheme.
View Article and Find Full Text PDFThe scarcity of annotated surgical data in robot-assisted surgery (RAS) motivates prior works to borrow related domain knowledge to achieve promising segmentation results in surgical images by adaptation. For dense instrument tracking in a robotic surgical video, collecting one initial scene to specify target instruments (or parts of tools) is desirable and feasible during the preoperative preparation. In this paper, we study the challenging one-shot instrument segmentation for robotic surgical videos, in which only the first frame mask of each video is provided at test time, such that the pre-trained model (learned from easily accessible source) can adapt to the target instruments.
View Article and Find Full Text PDFSurgical workflow recognition is a fundamental task in computer-assisted surgery and a key component of various applications in operating rooms. Existing deep learning models have achieved promising results for surgical workflow recognition, heavily relying on a large amount of annotated videos. However, obtaining annotation is time-consuming and requires the domain knowledge of surgeons.
View Article and Find Full Text PDFInt J Comput Assist Radiol Surg
September 2021
Purpose: Automatic segmentation of surgical instruments in robot-assisted minimally invasive surgery plays a fundamental role in improving context awareness. In this work, we present an instance segmentation model based on refined Mask R-CNN for accurately segmenting the instruments as well as identifying their types.
Methods: We re-formulate the instrument segmentation task as an instance segmentation task.
IEEE Trans Med Imaging
July 2021
Automatic surgical workflow recognition is a key component for developing context-aware computer-assisted systems in the operating theatre. Previous works either jointly modeled the spatial features with short fixed-range temporal information, or separately learned visual and long temporal cues. In this paper, we propose a novel end-to-end temporal memory relation network (TMRNet) for relating long-range and multi-scale temporal patterns to augment the present features.
View Article and Find Full Text PDFIntraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. While numerous methods for detecting, segmenting and tracking of medical instruments based on endoscopic video images have been proposed in the literature, key limitations remain to be addressed: Firstly, robustness, that is, the reliable performance of state-of-the-art methods when run on challenging images (e.g.
View Article and Find Full Text PDFInt J Comput Assist Radiol Surg
September 2020
Purpose: Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance, because it requires the domain knowledge of surgeons.
View Article and Find Full Text PDFSurgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis as well as very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is typically well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks.
View Article and Find Full Text PDFBackground Nasopharyngeal carcinoma (NPC) may be cured with radiation therapy. Tumor proximity to critical structures demands accuracy in tumor delineation to avoid toxicities from radiation therapy; however, tumor target contouring for head and neck radiation therapy is labor intensive and highly variable among radiation oncologists. Purpose To construct and validate an artificial intelligence (AI) contouring tool to automate primary gross tumor volume (GTV) contouring in patients with NPC.
View Article and Find Full Text PDF