Voice banking involves recording an inventory of sentences produced via natural speech. The recordings are used to create a synthetic text-to-speech voice that can be installed on speech-generating devices. This study highlights a minimally researched, clinically relevant issue surrounding the development and evaluation of Singaporean-accented English synthetic voices that were created using readily available voice banking software and hardware. Processes used to create seven unique synthetic voices that produce Singaporean-accented English, and the development of a custom Singaporean Colloquial English (SCE) recording inventory, are reviewed. The perspectives of adults who spoke SCE and banked their voices for this project are summarized and were generally positive. Finally, 100 adults familiar with SCE participated in an experiment that evaluated the intelligibility and naturalness of the Singaporean-accented synthetic voices, as well as the effect of the SCE custom inventory on listener preferences. The addition of the custom SCE inventory did not affect intelligibility or naturalness of the synthetic speech, and listeners tended to prefer the voice created with the SCE inventory when the stimulus was an SCE passage. The procedures used in this project may be helpful for interventionists who wish to create synthetic voices with accents that are not commercially available.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1080/07434618.2023.2181213 | DOI Listing |
Open Res Eur
January 2025
Center for Innovative Research and Liaison, Wakayama University, Wakayama, Wakayama Prefecture, Japan.
The purpose of this paper is to make easily available to the scientific community an efficient voice morphing tool called STRAIGHTMORPH and provide a short tutorial on its use with examples. STRAIGHTMORPH consists of a set of Matlab functions allowing the generation of high-quality, parametrically-controlled morphs of an arbitrary number of voice samples. A first step consists in extracting an 'mObject' for each voice sample, with accurate tracking of the fundamental frequency contour and manual definition of Time and Frequency anchors corresponding across samples to be morphed.
View Article and Find Full Text PDFInt J Lang Commun Disord
January 2025
Division of Communication Sciences and Disorders, University of Cape Town, Rondebosch, South Africa.
Background: There is a global need for synthetic speech development in multiple languages and dialects, as many children who cannot communicate using their natural voice struggle to find synthetic voices on high-technology devices that match their age, social and linguistic background.
Aims: To document multiple stakeholders' perspectives surrounding the quality, acceptability and utility of newly created synthetic speech in three under-resourced South African languages, namely South African English, Afrikaans and isiXhosa.
Methods & Procedures: A mixed methods research design was selected.
Sci Rep
January 2025
NeMO Lab, ASST GOM Niguarda Cà Granda Hospital, Milan, Italy.
Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease that can result in a progressive loss of speech due to bulbar dysfunction, which can have significant negative impact on the patient's mental well-being. Alternative Augmentative Communication (AAC) strategies based on synthetic voices have been shown to assist patients in maintaining communication and improving their Quality of Life (QoL). However, such synthetic voices are often perceived as impersonal and fail to capture the unique voice and identity of the patient.
View Article and Find Full Text PDFFront Physiol
November 2024
Department of Otorhinolaryngology, Medical School, Division of Phoniatrics and Pediatric Audiology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Head and Neck Surgery, University Hospital Erlangen, Erlangen, Waldstrasse, Germany.
Eur J Radiol
January 2025
School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Ismaninger Str. 22, 81675 Munich, Germany.
Purpose: Large language models (LLMs) promise to streamline radiology reporting. With the release of OpenAI's GPT-4o (Generative Pre-trained Transformers-4 omni), which processes not only text but also speech, multimodal LLMs might now also be used as medical speech recognition software for radiology reporting in multiple languages. This proof-of-concept study investigates the feasibility of using GPT-4o for automated voice-to-text transcription of radiology reports in English and German.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!