In the past half decade automatic speech recognition techniques, software and hardware technology have matured enough to support sophisticated medical applications. The project described aimed at introducing a computer-based, voice-controlled prototype system in a simulated vitreo-retinal surgery scenario. The aim was to provide the surgeon with a tool that could significantly improve the quality and ease of work and shorten the duration of intervention. The speech recognition system allows voice entry of simple commands to simulate surgical instrument control, including the infusion pump, vitreous cutter and diathermy. The project relies on a Markov-based, speaker-dependent, commercial isolated-word recognizer, and consists of a specific recognition vocabulary and application software, created and developed by the authors. Results have been encouraging. The system performed well under the test conditions, proving robust, simple to use and accurate (over 97% average word recognition rate). On the basis of their experience, the authors believe that automatic speech recognition technology, though suffering from some limitations such as the need for training, speaker dependence and a relatively small vocabulary, and requiring extensive testing under operating conditions, merits further development and opens new perspectives for a possible new generation of surgical instruments.

Download full-text PDF

Source
http://dx.doi.org/10.1177/112067219600600420DOI Listing

Publication Analysis

Top Keywords

speech recognition
16
automatic speech
12
vitreo-retinal surgery
8
computer-based voice-controlled
8
recognition
6
recognition vitreo-retinal
4
surgery project
4
project prototypal
4
prototypal computer-based
4
voice-controlled vitrectomy
4

Similar Publications

The dataset represents a significant advancement in Bengali lip-reading and visual speech recognition research, poised to drive future applications and technological progress. Despite Bengali's global status as the seventh most spoken language with approximately 265 million speakers, linguistically rich and widely spoken languages like Bengali have been largely overlooked by the research community. fills this gap by offering a pioneering dataset tailored for Bengali lip-reading, comprising visual data from 150 speakers across 54 classes, encompassing Bengali phonemes, alphabets, and symbols.

View Article and Find Full Text PDF

Objective: Measuring listening effort using pupillometry is challenging in cochlear implant (CI) users. We assess three validated speech tests (Matrix, LIST, and DIN) to identify the optimal speech material for measuring peak-pupil-dilation (PPD) in CI users as a function of signal-to-noise ratio (SNR).

Design: Speech tests were administered in quiet and two noisy conditions, namely at the speech recognition threshold (0 dB re SRT), i.

View Article and Find Full Text PDF

Tibetan-Chinese speech-to-speech translation based on discrete units.

Sci Rep

January 2025

Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China.

Speech-to-speech translation (S2ST) has evolved from cascade systems which integrate Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), to end-to-end models. This evolution has been driven by advancements in model performance and the expansion of cross-lingual speech datasets. Despite the paucity of research on Tibetan speech translation, this paper endeavors to tackle the challenge of Tibetan-to-Chinese direct speech-to-speech translation within the multi-task learning framework, employing self-supervised learning (SSL) and sequence-to-sequence model training.

View Article and Find Full Text PDF

Some Challenging Questions About Outcomes in Children With Cochlear Implants.

Perspect ASHA Spec Interest Groups

December 2024

DeVault Otologic Research Laboratory, Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis.

Purpose: Cochlear implants (CIs) have improved the quality of life for many children with severe-to-profound sensorineural hearing loss. Despite the reported CI benefits of improved speech recognition, speech intelligibility, and spoken language processing, large individual differences in speech and language outcomes are still consistently reported in the literature. The enormous variability in CI outcomes has made it challenging to predict which children may be at high risk for limited benefits and how potential risk factors can be improved with interventions.

View Article and Find Full Text PDF

Introduction: It is still under debate whether and how semantic content will modulate the emotional prosody perception in children with autism spectrum disorder (ASD). The current study aimed to investigate the issue using two experiments by systematically manipulating semantic information in Chinese disyllabic words.

Method: The present study explored the potential modulation of semantic content complexity on emotional prosody perception in Mandarin-speaking children with ASD.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!