Publications by authors named "Yutong Ban"

Article Synopsis
  • Surgery generates extensive video data that is crucial for research, clinical outcomes, and education, but the entire data lifecycle requires standardized frameworks to ensure quality and compliance with ethical regulations.
  • Working groups comprised of clinicians and data experts collaborated to develop structured recommendations across key areas: Data Use, Structure, Exploration, and Governance using a modified Delphi process.
  • The consensus offers clear guidelines for standardized surgical video data management, emphasizing transparency, diversity, and attention to legal and ethical issues, aiming to improve data utilization for all stakeholders involved.
View Article and Find Full Text PDF

Analysis of relations between objects and comprehension of abstract concepts in the surgical video is important in AI-augmented surgery. However, building models that integrate our knowledge and understanding of surgery remains a challenging endeavor. In this paper, we propose a novel way to integrate conceptual knowledge into temporal analysis tasks using temporal concept graph networks.

View Article and Find Full Text PDF

Transformers have proven superior performance for a wide variety of tasks since they were introduced. In recent years, they have drawn attention from the vision community in tasks such as image classification and object detection. Despite this wave, an accurate and efficient multiple-object tracking (MOT) method based on transformers is yet to be designed.

View Article and Find Full Text PDF

Background: Operative courses of laparoscopic cholecystectomies vary widely due to differing pathologies. Efforts to assess intra-operative difficulty include the Parkland grading scale (PGS), which scores inflammation from the initial view of the gallbladder on a 1-5 scale. We investigated the impact of PGS on intra-operative outcomes, including laparoscopic duration, attainment of the critical view of safety (CVS), and gallbladder injury.

View Article and Find Full Text PDF

Annotation of surgical video is important for establishing ground truth in surgical data science endeavors that involve computer vision. With the growth of the field over the last decade, several challenges have been identified in annotating spatial, temporal, and clinical elements of surgical video as well as challenges in selecting annotators. In reviewing current challenges, we provide suggestions on opportunities for improvement and possible next steps to enable translation of surgical data science efforts in surgical video analysis to clinical research and practice.

View Article and Find Full Text PDF

The fields of computer vision (CV) and artificial intelligence (AI) have undergone rapid advancements in the past decade, many of which have been applied to the analysis of intraoperative video. These advances are driven by wide-spread application of deep learning, which leverages multiple layers of neural networks to teach computers complex tasks. Prior to these advances, applications of AI in the operating room were limited by our relative inability to train computers to accurately understand images with traditional machine learning (ML) techniques.

View Article and Find Full Text PDF

Background: Artificial intelligence (AI) and computer vision (CV) have revolutionized image analysis. In surgery, CV applications have focused on surgical phase identification in laparoscopic videos. We proposed to apply CV techniques to identify phases in an endoscopic procedure, peroral endoscopic myotomy (POEM).

View Article and Find Full Text PDF

In this article, we address the problem of tracking multiple speakers via the fusion of visual and auditory information. We propose to exploit the complementary nature and roles of these two modalities in order to accurately estimate smooth trajectories of the tracked persons, to deal with the partial or total absence of one of the modalities over short periods of time, and to estimate the acoustic status-either speaking or silent-of each tracked person over time. We propose to cast the problem at hand into a generative audio-visual fusion (or association) model formulated as a latent-variable temporal graphical model.

View Article and Find Full Text PDF