Perception of sinewave vowels.

J Acoust Soc Am

Department of Speech Pathology and Audiology, Western Michigan University, Kalamazoo, Michigan 49008, USA.

Published: June 2011

There is a significant body of research examining the intelligibility of sinusoidal replicas of natural speech. Discussion has followed about what the sinewave speech phenomenon might imply about the mechanisms underlying phonetic recognition. However, most of this work has been conducted using sentence material, making it unclear what the contributions are of listeners' use of linguistic constraints versus lower level phonetic mechanisms. This study was designed to measure vowel intelligibility using sinusoidal replicas of naturally spoken vowels. The sinusoidal signals were modeled after 300 /hVd/ syllables spoken by men, women, and children. Students enrolled in an introductory phonetics course served as listeners. Recognition rates for the sinusoidal vowels averaged 55%, which is much lower than the ∼95% intelligibility of the original signals. Attempts to improve performance using three different training methods met with modest success, with post-training recognition rates rising by ∼5-11 percentage points. Follow-up work showed that more extensive training produced further improvements, with performance leveling off at ∼73%-74%. Finally, modeling work showed that a fairly simple pattern-matching algorithm trained on naturally spoken vowels classified sinewave vowels with 78.3% accuracy, showing that the sinewave speech phenomenon does not necessarily rule out template matching as a mechanism underlying phonetic recognition.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3135151PMC
http://dx.doi.org/10.1121/1.3573980DOI Listing

Publication Analysis

Top Keywords

sinewave vowels
8
intelligibility sinusoidal
8
sinusoidal replicas
8
sinewave speech
8
speech phenomenon
8
underlying phonetic
8
phonetic recognition
8
naturally spoken
8
spoken vowels
8
recognition rates
8

Similar Publications

Contribution of Temporal Fine Structure Cues to Concurrent Vowel Identification and Perception of Zebra Speech.

Int Arch Otorhinolaryngol

July 2024

Department of Audiology and Speech-Language Pathology, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Manipal, Karnataka, India.

 The limited access to temporal fine structure (TFS) cues is a reason for reduced speech-in-noise recognition in cochlear implant (CI) users. The CI signal processing schemes like electroacoustic stimulation (EAS) and fine structure processing (FSP) encode TFS in the low frequency whereas theoretical strategies such as frequency amplitude modulation encoder (FAME) encode TFS in all the bands.  The present study compared the effect of simulated CI signal processing schemes that either encode no TFS, TFS information in all bands, or TFS only in low-frequency bands on concurrent vowel identification (CVI) and Zebra speech perception (ZSP).

View Article and Find Full Text PDF

Objectives: This in silico study explored the effects of a wide range of fundamental frequency (f), source-spectrum tilt (SST), and vibrato extent (VE) on commonly used frequency and amplitude perturbation and noise measures.

Method: Using 53 synthesized tones produced in Madde, the effects of stepwise increases in f, intensity (modeled by decreasing SST), and VE on the PRAAT parameters jitter % (local), relative average perturbation (RAP) %, shimmer % (local), amplitude perturbation quotient 3 (APQ3) %, and harmonics-to-noise ratio (HNR) dB were investigated. A secondary experiment was conducted to determine whether any f effects on jitter, RAP, shimmer, APQ3, and HNR were stable.

View Article and Find Full Text PDF

The perception of the /da/-/ga/ series, distinguished primarily by the third formant (F3) transition, is affected by many nonspeech and speech sounds. Previous studies mainly investigated the influences of context stimuli with frequency bands located in the F3 region and proposed the account of spectral contrast effects. This study examined the effects of context stimuli with bands not in the F3 region.

View Article and Find Full Text PDF

Purpose: The objective of this study was to determine if and how the subcortical neural representation of pitch cues in listeners with normal hearing is affected by systematic manipulation of vocoder parameters.

Method: This study assessed the effects of temporal envelope cutoff frequency (50 and 500 Hz), number of channels (1-32), and carrier type (sine-wave and noise-band) on brainstem neural representation of fundamental frequency ( ) in frequency-following responses (FFRs) to vocoded vowels of 15 young adult listeners with normal hearing.

Results: Results showed that FFR strength (quantified as absolute magnitude divided by noise floor [NF] magnitude) significantly improved with 500-Hz vs.

View Article and Find Full Text PDF

Lateral temporal measures of the auditory evoked potential (AEP) including the T-complex (positive Ta and negative Tb), as well as an earlier negative peak (Na) index maturation of auditory/speech processing. Previous studies have shown that these measures distinguish neural processing in children with typical language development (TD) from those with disorders and monolingual from bilingual children. In this study, bilingual children with Turkish as L1 and German as L2 were compared with monolingual German-speaking children with developmental language disorder (DLD) and monolingual German-speaking children with TD in order to disentangle effects of limited language input vs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!