Deepfakes are viral ingredients of digital environments, and they can trick human cognition into misperceiving the fake as real. Here, we test the neurocognitive sensitivity of 25 participants to accept or reject person identities as recreated in audio deepfakes. We generate high-quality voice identity clones from natural speakers by using advanced deepfake technologies.
View Article and Find Full Text PDFJASA Express Lett
January 2024
Distinguishing shouted from non-shouted speech is crucial in communication. We examined how shouting affects temporal properties of the amplitude envelope (ENV) in a total of 720 sentences read by 18 Swiss German speakers in normal and shouted modes; shouting was characterised by maintaining sound pressure levels of ≥80 dB sound pressure level (dB-SPL) (C-weighted) at a 1-meter distance from the mouth. Generalized additive models revealed significant temporal alterations of ENV in shouted speech, marked by steeper ascent, delayed peak, and extended high levels.
View Article and Find Full Text PDFHuman voice recognition over telephone channels typically yields lower accuracy when compared to audio recorded in a studio environment with higher quality. Here, we investigated the extent to which audio in video conferencing, subject to various lossy compression mechanisms, affects human voice recognition performance. Voice recognition performance was tested in an old-new recognition task under three audio conditions (telephone, Zoom, studio) across all matched (familiarization and test with same audio condition) and mismatched combinations (familiarization and test with different audio conditions).
View Article and Find Full Text PDFIntroduction: Cooperation, acoustically signaled through vocal convergence, is facilitated when group members are more similar. Excessive vocal convergence may, however, weaken individual recognizability. This study aimed to explore whether constraints to convergence can arise in circumstances where interlocutors need to enhance their vocal individuality.
View Article and Find Full Text PDFThe human auditory system is capable of processing human speech even in situations when it has been heavily degraded, such as during noise-vocoding, when frequency domain-based cues to phonetic content are strongly reduced. This has contributed to arguments that speech processing is highly specialized and likely a de novo evolved trait in humans. Previous comparative research has demonstrated that a language competent chimpanzee was also capable of recognizing degraded speech, and therefore that the mechanisms underlying speech processing may not be uniquely human.
View Article and Find Full Text PDFJ Acoust Soc Am
October 2021
Foreign-accented speech typically deviates segmentally and suprasegmentally from native-accented speech. Two experiments were conducted to investigate the role of amplitude envelope (ENV), segment duration (DUR), and speech rate (SR) on Italian listeners' ability to identify native-accented Italian in utterances produced by Zurich German speakers. In experiment 1, listeners judged in a two-alternative forced-choice perception task which of the two stimuli in a trial they perceived as more native-like.
View Article and Find Full Text PDFAn unsupervised automatic clustering algorithm (k-means) classified 1282 Mel frequency cepstral coefficient (MFCC) representations of isolated steady-state vowel utterances from eight standard German vowel categories with f between 196 and 698 Hz. Experiment I obtained the number of MFCCs (1-20) in connection with the spectral bandwidth (2-20 kHz) at which performance peaked (five MFCCs at 4 kHz). In experiment II, classification performance with different ranges of f revealed that ranges with f > 500 Hz reduced classification performance but it remained well above chance.
View Article and Find Full Text PDFAge-related decline in speech perception may result in difficulties partaking in spoken conversation and potentially lead to social isolation and cognitive decline in older adults. It is therefore important to better understand how age-related differences in neurostructural factors such as cortical thickness (CT) and cortical surface area (CSA) are related to neurophysiological sensitivity to speech cues in younger and older adults. Age-related differences in CT and CSA of bilateral auditory-related areas were extracted using FreeSurfer in younger and older adults with normal peripheral hearing.
View Article and Find Full Text PDFJ Acoust Soc Am
March 2019
First formant (F1) trajectories of vocalic intervals were divided into positive and negative dynamics. Positive F1 dynamics were defined as the speeds of F1 increases to reach the maxima, and negative F1 dynamics as the speeds of F1 decreases away from the maxima. Mean, standard deviation, and sequential variability were measured for both dynamics.
View Article and Find Full Text PDFThe perception of stress is highly influenced by listeners' native language. In this research, the authors examined the effect of intonation and talker variability (here: phonetic variability) in the discrimination of Spanish lexical stress contrasts by native Spanish (N = 17), German (N = 21), and French (N = 27) listeners. Participants listened to 216 trials containing three Spanish disyllabic words, where one word carried a different lexical stress to the others.
View Article and Find Full Text PDFThe phonological function of vowels can be maintained at fundamental frequencies (f) up to 880 Hz [Friedrichs, Maurer, and Dellwo (2015). J. Acoust.
View Article and Find Full Text PDFJ Acoust Soc Am
May 2017
Intensity contours of speech signals were sub-divided into positive and negative dynamics. Positive dynamics were defined as the speed of increases in intensity from amplitude troughs to subsequent peaks, and negative dynamics as the speed of decreases in intensity from peaks to troughs. Mean, standard deviation, and sequential variability were measured for both dynamics in each sentence.
View Article and Find Full Text PDFThis EEG-study aims to investigate age-related differences in the neural oscillation patterns during the processing of temporally modulated speech. Viewing from a lifespan perspective, we recorded the electroencephalogram (EEG) data of three age samples: young adults, middle-aged adults and older adults. Stimuli consisted of temporally degraded sentences in Swedish-a language unfamiliar to all participants.
View Article and Find Full Text PDFIn a between-subject perception task, listeners either identified full words or vowels isolated from these words at F0s between 220 and 880 Hz. They received two written words as response options (minimal pair with the stimulus vowel in contrastive position). Listeners' sensitivity (A') was extremely high in both conditions at all F0s, showing that the phonological function of vowels can also be maintained at high F0s.
View Article and Find Full Text PDFBetween-speaker variability of acoustically measurable speech rhythm [%V, ΔV(ln), ΔC(ln), and Δpeak(ln)] was investigated when within-speaker variability of (a) articulation rate and (b) linguistic structural characteristics was introduced. To study (a), 12 speakers of Standard German read seven lexically identical sentences under five different intended tempo conditions (very slow, slow, normal, fast, very fast). To study (b), 16 speakers of Zurich Swiss German produced 16 spontaneous utterances each (256 in total) for which transcripts were made and then read by all speakers (4096 sentences; 16 speaker × 256 sentences).
View Article and Find Full Text PDFEveryday experience tells us that it is often possible to identify a familiar speaker solely by his/her voice. Such observations reveal that speakers carry individual features in their voices. The present study examines how suprasegmental temporal features contribute to speaker-individuality.
View Article and Find Full Text PDFIntegrating visual and auditory language information is critical for reading. Suppression and congruency effects in audiovisual paradigms with letters and speech sounds have provided information about low-level mechanisms of grapheme-phoneme integration during reading. However, the central question about how such processes relate to reading entire words remains unexplored.
View Article and Find Full Text PDF