Voice banking to support individuals who use speech-generating devices: development and evaluation of Singaporean-accented English synthetic voices and a Singapore Colloquial English recording inventory.

Mo Chen Jolene Hyppa-Martin H Timothy Bunnell Jason Lilley Celestine Foo Han Wei Tan Wei Shun Lim

Augment Altern Commun

College of Humanities, Arts, and Social Sciences, Nanyang Technological University, Singapore.

Published: December 2023

Voice banking involves recording an inventory of sentences produced via natural speech. The recordings are used to create a synthetic text-to-speech voice that can be installed on speech-generating devices. This study highlights a minimally researched, clinically relevant issue surrounding the development and evaluation of Singaporean-accented English synthetic voices that were created using readily available voice banking software and hardware. Processes used to create seven unique synthetic voices that produce Singaporean-accented English, and the development of a custom Singaporean Colloquial English (SCE) recording inventory, are reviewed. The perspectives of adults who spoke SCE and banked their voices for this project are summarized and were generally positive. Finally, 100 adults familiar with SCE participated in an experiment that evaluated the intelligibility and naturalness of the Singaporean-accented synthetic voices, as well as the effect of the SCE custom inventory on listener preferences. The addition of the custom SCE inventory did not affect intelligibility or naturalness of the synthetic speech, and listeners tended to prefer the voice created with the SCE inventory when the stimulus was an SCE passage. The procedures used in this project may be helpful for interventionists who wish to create synthetic voices with accents that are not commercially available.

Download full-text PDF	Source
http://dx.doi.org/10.1080/07434618.2023.2181213	DOI Listing

Publication Analysis

Top Keywords

synthetic voices

voice banking

singaporean-accented english

recording inventory

speech-generating devices

development evaluation

evaluation singaporean-accented

english synthetic

colloquial english

create synthetic

Similar Publications

STRAIGHTMORPH: A Voice Morphing Tool for Research in Voice Communication Sciences.

Open Res Eur

January 2025

Center for Innovative Research and Liaison, Wakayama University, Wakayama, Wakayama Prefecture, Japan.

P Belin H Kawahara

The purpose of this paper is to make easily available to the scientific community an efficient voice morphing tool called STRAIGHTMORPH and provide a short tutorial on its use with examples. STRAIGHTMORPH consists of a set of Matlab functions allowing the generation of high-quality, parametrically-controlled morphs of an arbitrary number of voice samples. A first step consists in extracting an 'mObject' for each voice sample, with accurate tracking of the fundamental frequency contour and manual definition of Time and Frequency anchors corresponding across samples to be morphed.

View Article and Find Full Text PDF

Similar Publications

Do you like my voice? Stakeholder perspectives about the acceptability of synthetic child voices in three South African languages.

Int J Lang Commun Disord

January 2025

Division of Communication Sciences and Disorders, University of Cape Town, Rondebosch, South Africa.

Camryn Claire Terblanche Michelle Pascoe Michal Harty

Background: There is a global need for synthetic speech development in multiple languages and dialects, as many children who cannot communicate using their natural voice struggle to find synthetic voices on high-technology devices that match their age, social and linguistic background.

Aims: To document multiple stakeholders' perspectives surrounding the quality, acceptability and utility of newly created synthetic speech in three under-resourced South African languages, namely South African English, Afrikaans and isiXhosa.

Methods & Procedures: A mixed methods research design was selected.

View Article and Find Full Text PDF

Similar Publications

Artificial intelligence empowered voice generation for amyotrophic lateral sclerosis patients.

Sci Rep

January 2025

NeMO Lab, ASST GOM Niguarda Cà Granda Hospital, Milan, Italy.

Stefano Regondi Giordana Donvito Emanuele Frontoni Milutin Kostovic Fabio Minazzi

Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease that can result in a progressive loss of speech due to bulbar dysfunction, which can have significant negative impact on the patient's mental well-being. Alternative Augmentative Communication (AAC) strategies based on synthetic voices have been shown to assist patients in maintaining communication and improving their Quality of Life (QoL). However, such synthetic voices are often perceived as impersonal and fail to capture the unique voice and identity of the patient.

View Article and Find Full Text PDF

Similar Publications

Influence of flow rate and fiber tension on dynamical, mechanical and acoustical parameters in a synthetic larynx model with integrated fibers.

Front Physiol

November 2024

Department of Otorhinolaryngology, Medical School, Division of Phoniatrics and Pediatric Audiology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Head and Neck Surgery, University Hospital Erlangen, Erlangen, Waldstrasse, Germany.

Lucia Gühring Bogac Tur Marion Semmler Anne Schützenberger Stefan Kniesburges

Article Synopsis

The study investigates how airflow and fiber tension affect voice production by analyzing the oscillation of vocal folds using a synthetic larynx model.
It involved 76 experiments measuring various factors like vocal fold motion and sound output, with a focus on how flow rate and tension vary.
Results showed that while flow rate mainly influences phonation characteristics, the fundamental frequency and quality of the sound are largely determined by the tension of the vocal folds.

View Article and Find Full Text PDF

Similar Publications

Multilingual feasibility of GPT-4o for automated Voice-to-Text CT and MRI report transcription.

Eur J Radiol

January 2025

School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Ismaninger Str. 22, 81675 Munich, Germany.

Felix Busch Philipp Prucker Alexander Komenda Sebastian Ziegelmayer Marcus R Makowski

Purpose: Large language models (LLMs) promise to streamline radiology reporting. With the release of OpenAI's GPT-4o (Generative Pre-trained Transformers-4 omni), which processes not only text but also speech, multimodal LLMs might now also be used as medical speech recognition software for radiology reporting in multiple languages. This proof-of-concept study investigates the feasibility of using GPT-4o for automated voice-to-text transcription of radiology reports in English and German.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!