Given an orthographic transcription, forced alignment systems automatically determine boundaries between segments in speech, facilitating the use of large corpora. In the present paper, we introduce a neural network-based forced alignment system, the Mason-Alberta Phonetic Segmenter (MAPS). MAPS serves as a testbed for two possible improvements we pursue for forced alignment systems.
View Article and Find Full Text PDFAugment Altern Commun
July 2024
It is well-known that children with expressive communication difficulties have the right to communicate, but they should also have the right to do so in whichever language they choose, with a voice that closely matches their age, gender, and dialect. This study aimed to develop naturalistic synthetic child speech, matching the vocal identity of three children with expressive communication difficulties, using Tacotron 2, for three under-resourced South African languages, namely South African English (SAE), Afrikaans, and isiXhosa. Due to the scarcity of child speech corpora, 2 hours of child speech data per child was collected from three 11- to 12-year-old children.
View Article and Find Full Text PDFThis study constitutes an investigation into the acoustic variability of intervocalic alveolar taps in a corpus of spontaneous speech from Madrid, Spain. Substantial variability was documented in this segment, with highly reduced variants constituting roughly half of all tokens during spectrographic inspection. In addition to qualitative documentation, the intensity difference between the tap and surrounding vowels was measured.
View Article and Find Full Text PDFThis study examines the role of frequencies above 8 kHz in the classification of conversational speech fricatives [f, v, θ, ð, s, z, ʃ, ʒ, h] in random forest modeling. Prior research has mostly focused on spectral measures for fricative categorization using frequency information below 8 kHz. The contribution of higher frequencies has received only limited attention, especially for non-laboratory speech.
View Article and Find Full Text PDFThe papers in this special issue provide a critical look at some historical ideas that have had an influence on research and teaching in the field of speech communication. They also address widely used methodologies or address long-standing methodological challenges in the areas of speech perception and speech production. The goal is to reconsider and evaluate the need for caution or replacement of historical ideas with more modern results and methods.
View Article and Find Full Text PDFWhile known to influence visual lexical processing, the semantic information we associate with words has recently been found to influence auditory lexical processing as well. The present work explored the influence of in auditory lexical decision. Study 1 recreated an experiment investigating semantic richness effects in concrete nouns (Goh et al.
View Article and Find Full Text PDFThe present study compares the production of fricatives in conversational versus read speech in American English. The goal is to examine which parameters contribute to the identification of fricatives across the two speech styles. The study surveys over 162 000 fricative tokens from the Buckeye Corpus [Pitt, Johnson, Hume, Kiesling, and Raymond (2005).
View Article and Find Full Text PDFJASA Express Lett
August 2021
The present study investigates the informativity of anticipatory coarticulatory acoustic detail about inflectional suffixes in English verbs, performing two experiments in which listeners classified inflectional functions of verbs. Listener response latencies were slower when acoustic detail resulting from anticipatory coarticulation mismatched with the inflectional suffix. The results indicate that listeners actively use coarticulatory phonetic detail to predict the verbs' inflectional function.
View Article and Find Full Text PDFWe present an implementation of DIANA, a computational model of spoken word recognition, to model responses collected in the Massive Auditory Lexical Decision (MALD) project. DIANA is an end-to-end model, including an activation and decision component that takes the acoustic signal as input, activates internal word representations, and outputs lexicality judgments and estimated response latencies. Simulation 1 presents the process of creating acoustic models required by DIANA to analyze novel speech input.
View Article and Find Full Text PDFRecent evidence indicates that a word's paradigmatic neighbors affect production. However, these findings have mostly been obtained in careful laboratory settings using words in isolation, and thus ignoring potential effects that may arise from the syntagmatic context, which is typically present in spontaneous speech. The current corpus analysis investigates paradigmatic and syntagmatic effects in Estonian spontaneous speech.
View Article and Find Full Text PDFIn conversational speech, phones and entire syllables are often missing. This can make "he's" and "he was" homophonous, realized for example as [ɨz]. Similarly, "you're" and "you were" can both be realized as [jɚ], [ɨ], etc.
View Article and Find Full Text PDFJ Acoust Soc Am
February 2022
Using phonological neighborhood density has been a common method to quantify lexical competition. It is useful and convenient but has shortcomings that are worth reconsidering. The present study quantifies the effects of lexical competition during spoken word recognition using acoustic distance and acoustic absement rather than phonological neighborhood density.
View Article and Find Full Text PDFMorphology (Dordr)
February 2021
Many theories of word structure in linguistics and morphological processing in cognitive psychology are grounded in a compositional perspective on the (mental) lexicon in which complex words are built up during speech production from sublexical elements such as morphemes, stems, and exponents. When combined with the hypothesis that storage in the lexicon is restricted to the irregular, the prediction follows that properties specific to regular inflected words cannot co-determine the phonetic realization of these inflected words. This study shows that the stem vowels of regular English inflected verb forms that are more frequent in their paradigm are produced with more enhanced articulatory gestures in the midsaggital plane, challenging compositional models of lexical processing.
View Article and Find Full Text PDFBackground: Major depressive disorder (MDD) is the second highest cause of disability worldwide. Standard treatments for MDD include medicine and talk therapy; however, approximately 1 in 5 Canadians fail to respond to these approaches and must consider alternatives. Transcranial direct current stimulation (tDCS) is a safe, noninvasive method that uses electrical stimulation to change the activation pattern of different brain regions.
View Article and Find Full Text PDFIn this overview we describe literature on how speech production and speech perception change in healthy or normal aging across the adult lifespan. In the production section we review acoustic characteristics that have been investigated as potentially distinguishing younger and older adults. In the speech perception section studies concerning speaker age estimation and those investigating older listeners' perception are addressed.
View Article and Find Full Text PDFABSTRACChildren with cerebral palsy (CP) are characterized as difficult to understand because of poor articulation and breathy voice quality. This case series describes the subsystems of the speech mechanism (i.e.
View Article and Find Full Text PDFRepeating the movements associated with activities such as drawing or sports typically leads to improvements in kinematic behavior: these movements become faster, smoother, and exhibit less variation. Likewise, practice has also been shown to lead to faster and smoother movement trajectories in speech articulation. However, little is known about its effect on articulatory variability.
View Article and Find Full Text PDFProducing speech that is clear, audible, and intelligible to others is a challenge for many children with cerebral palsy (CP) and children with Down syndrome (DS). Previous studies have demonstrated the effectiveness of using the Lee Silverman Voice Treatment (LSVT LOUD®) to increase vocal loudness and improve speech intelligibility in individuals with dysarthria secondary to Parkinson's disease (PD), and some research suggests that it also may be effective for individuals with dysarthria secondary to other conditions, including CP and DS. Although LSVT LOUD targets healthy vocal loudness, there is some evidence of spreading effects to the articulatory system.
View Article and Find Full Text PDFJ Acoust Soc Am
April 2020
As scientists, it is important to sample as broadly as possible; however, there is a bias in the research performed on the speech acoustics of the world's languages toward work on languages of convenience (e.g., English).
View Article and Find Full Text PDFMultiple measures of vowel overlap have been proposed that use F1, F2, and duration to calculate the degree of overlap between vowel categories. The present study assesses four of these measures: the spectral overlap assessment metric [SOAM; Wassink (2006). J.
View Article and Find Full Text PDFPrevious research has shown that compound word recognition involves selecting a relational meaning (e.g., 'box for letters' for ) out of a set of competing relational meanings for the same compound.
View Article and Find Full Text PDFThe Massive Auditory Lexical Decision (MALD) database is an end-to-end, freely available auditory and production data set for speech and psycholinguistic research, providing time-aligned stimulus recordings for 26,793 words and 9592 pseudowords, and response data for 227,179 auditory lexical decisions from 231 unique monolingual English listeners. In addition to the experimental data, we provide many precompiled listener- and item-level descriptor variables. This data set makes it easy to explore responses, build and test theories, and compare a wide range of models.
View Article and Find Full Text PDFSpoken language manifests itself as change over time in various acoustic dimensions. While it seems clear that acoustic-phonetic information in the speech signal is key to language processing, little is currently known about which specific types of acoustic information are relatively more informative to listeners. This problem is likely compounded when considering reduced speech: Which specific acoustic information do listeners rely on when encountering spoken forms that are highly variable, and often include altered or elided segments? This work explores contributions of spectral shape, f0 contour, target duration, and time varying intensity in the perception of reduced speech.
View Article and Find Full Text PDFAtten Percept Psychophys
October 2015
In this study, we examined speaker-dependent (acoustic) and speaker-independent (lexical) linguistic influences on perceived foreign accentedness. Accentedness ratings assigned to Chinese-accented English words were analyzed, taking accentedness as a continuum. The speaker-dependent variables were included as acoustic distances, measured in relation to typical native-speaker values.
View Article and Find Full Text PDFLexical tone identification requires a number of secondary cues, when main tonal contours are unavailable. In this article, we examine Mandarin native speakers' ability to identify lexical tones by extracting tonal information from sonorant onset pitch (onset contours) on syllable-initial nasals ranging from 50 to 70 ms in duration. In experiments I and II we test speakers' ability to identify lexical tones in a second syllable with and without onset contours in isolation (experiment I) and in a sentential context (experiment II).
View Article and Find Full Text PDF