In the past half decade automatic speech recognition techniques, software and hardware technology have matured enough to support sophisticated medical applications. The project described aimed at introducing a computer-based, voice-controlled prototype system in a simulated vitreo-retinal surgery scenario. The aim was to provide the surgeon with a tool that could significantly improve the quality and ease of work and shorten the duration of intervention. The speech recognition system allows voice entry of simple commands to simulate surgical instrument control, including the infusion pump, vitreous cutter and diathermy. The project relies on a Markov-based, speaker-dependent, commercial isolated-word recognizer, and consists of a specific recognition vocabulary and application software, created and developed by the authors. Results have been encouraging. The system performed well under the test conditions, proving robust, simple to use and accurate (over 97% average word recognition rate). On the basis of their experience, the authors believe that automatic speech recognition technology, though suffering from some limitations such as the need for training, speaker dependence and a relatively small vocabulary, and requiring extensive testing under operating conditions, merits further development and opens new perspectives for a possible new generation of surgical instruments.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1177/112067219600600420 | DOI Listing |
Data Brief
February 2025
Department of Electrical, Electronic and Communication Engineering, Military Institute of Science and Technology (MIST), Dhaka 1216, Bangladesh.
The dataset represents a significant advancement in Bengali lip-reading and visual speech recognition research, poised to drive future applications and technological progress. Despite Bengali's global status as the seventh most spoken language with approximately 265 million speakers, linguistically rich and widely spoken languages like Bengali have been largely overlooked by the research community. fills this gap by offering a pioneering dataset tailored for Bengali lip-reading, comprising visual data from 150 speakers across 54 classes, encompassing Bengali phonemes, alphabets, and symbols.
View Article and Find Full Text PDFInt J Audiol
January 2025
Department of Otorhinolaryngology and Head & Neck Surgery, Leiden University Medical Center, Leiden, Netherlands.
Objective: Measuring listening effort using pupillometry is challenging in cochlear implant (CI) users. We assess three validated speech tests (Matrix, LIST, and DIN) to identify the optimal speech material for measuring peak-pupil-dilation (PPD) in CI users as a function of signal-to-noise ratio (SNR).
Design: Speech tests were administered in quiet and two noisy conditions, namely at the speech recognition threshold (0 dB re SRT), i.
Sci Rep
January 2025
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China.
Speech-to-speech translation (S2ST) has evolved from cascade systems which integrate Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), to end-to-end models. This evolution has been driven by advancements in model performance and the expansion of cross-lingual speech datasets. Despite the paucity of research on Tibetan speech translation, this paper endeavors to tackle the challenge of Tibetan-to-Chinese direct speech-to-speech translation within the multi-task learning framework, employing self-supervised learning (SSL) and sequence-to-sequence model training.
View Article and Find Full Text PDFPerspect ASHA Spec Interest Groups
December 2024
DeVault Otologic Research Laboratory, Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis.
Purpose: Cochlear implants (CIs) have improved the quality of life for many children with severe-to-profound sensorineural hearing loss. Despite the reported CI benefits of improved speech recognition, speech intelligibility, and spoken language processing, large individual differences in speech and language outcomes are still consistently reported in the literature. The enormous variability in CI outcomes has made it challenging to predict which children may be at high risk for limited benefits and how potential risk factors can be improved with interventions.
View Article and Find Full Text PDFJ Commun Disord
January 2025
School of Foreign Studies, China University of Petroleum (East China), Qingdao, China. Electronic address:
Introduction: It is still under debate whether and how semantic content will modulate the emotional prosody perception in children with autism spectrum disorder (ASD). The current study aimed to investigate the issue using two experiments by systematically manipulating semantic information in Chinese disyllabic words.
Method: The present study explored the potential modulation of semantic content complexity on emotional prosody perception in Mandarin-speaking children with ASD.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!