Voice banking involves recording an inventory of sentences produced via natural speech. The recordings are used to create a synthetic text-to-speech voice that can be installed on speech-generating devices. This study highlights a minimally researched, clinically relevant issue surrounding the development and evaluation of Singaporean-accented English synthetic voices that were created using readily available voice banking software and hardware. Processes used to create seven unique synthetic voices that produce Singaporean-accented English, and the development of a custom Singaporean Colloquial English (SCE) recording inventory, are reviewed. The perspectives of adults who spoke SCE and banked their voices for this project are summarized and were generally positive. Finally, 100 adults familiar with SCE participated in an experiment that evaluated the intelligibility and naturalness of the Singaporean-accented synthetic voices, as well as the effect of the SCE custom inventory on listener preferences. The addition of the custom SCE inventory did not affect intelligibility or naturalness of the synthetic speech, and listeners tended to prefer the voice created with the SCE inventory when the stimulus was an SCE passage. The procedures used in this project may be helpful for interventionists who wish to create synthetic voices with accents that are not commercially available.

Download full-text PDF

Source
http://dx.doi.org/10.1080/07434618.2023.2181213DOI Listing

Publication Analysis

Top Keywords

synthetic voices
20
voice banking
12
singaporean-accented english
12
recording inventory
12
speech-generating devices
8
development evaluation
8
evaluation singaporean-accented
8
english synthetic
8
colloquial english
8
create synthetic
8

Similar Publications

STRAIGHTMORPH: A Voice Morphing Tool for Research in Voice Communication Sciences.

Open Res Eur

January 2025

Center for Innovative Research and Liaison, Wakayama University, Wakayama, Wakayama Prefecture, Japan.

The purpose of this paper is to make easily available to the scientific community an efficient voice morphing tool called STRAIGHTMORPH and provide a short tutorial on its use with examples. STRAIGHTMORPH consists of a set of Matlab functions allowing the generation of high-quality, parametrically-controlled morphs of an arbitrary number of voice samples. A first step consists in extracting an 'mObject' for each voice sample, with accurate tracking of the fundamental frequency contour and manual definition of Time and Frequency anchors corresponding across samples to be morphed.

View Article and Find Full Text PDF

Background: There is a global need for synthetic speech development in multiple languages and dialects, as many children who cannot communicate using their natural voice struggle to find synthetic voices on high-technology devices that match their age, social and linguistic background.

Aims: To document multiple stakeholders' perspectives surrounding the quality, acceptability and utility of newly created synthetic speech in three under-resourced South African languages, namely South African English, Afrikaans and isiXhosa.

Methods & Procedures: A mixed methods research design was selected.

View Article and Find Full Text PDF

Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease that can result in a progressive loss of speech due to bulbar dysfunction, which can have significant negative impact on the patient's mental well-being. Alternative Augmentative Communication (AAC) strategies based on synthetic voices have been shown to assist patients in maintaining communication and improving their Quality of Life (QoL). However, such synthetic voices are often perceived as impersonal and fail to capture the unique voice and identity of the patient.

View Article and Find Full Text PDF

Influence of flow rate and fiber tension on dynamical, mechanical and acoustical parameters in a synthetic larynx model with integrated fibers.

Front Physiol

November 2024

Department of Otorhinolaryngology, Medical School, Division of Phoniatrics and Pediatric Audiology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Head and Neck Surgery, University Hospital Erlangen, Erlangen, Waldstrasse, Germany.

Article Synopsis
  • The study investigates how airflow and fiber tension affect voice production by analyzing the oscillation of vocal folds using a synthetic larynx model.
  • It involved 76 experiments measuring various factors like vocal fold motion and sound output, with a focus on how flow rate and tension vary.
  • Results showed that while flow rate mainly influences phonation characteristics, the fundamental frequency and quality of the sound are largely determined by the tension of the vocal folds.
View Article and Find Full Text PDF

Multilingual feasibility of GPT-4o for automated Voice-to-Text CT and MRI report transcription.

Eur J Radiol

January 2025

School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Ismaninger Str. 22, 81675 Munich, Germany.

Purpose: Large language models (LLMs) promise to streamline radiology reporting. With the release of OpenAI's GPT-4o (Generative Pre-trained Transformers-4 omni), which processes not only text but also speech, multimodal LLMs might now also be used as medical speech recognition software for radiology reporting in multiple languages. This proof-of-concept study investigates the feasibility of using GPT-4o for automated voice-to-text transcription of radiology reports in English and German.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!