Utterance Clustering Using Stereo Audio Channels.

Yingjun Dong Neil G MacLaren Yiding Cao Francis J Yammarino Shelley D Dionne Michael D Mumford Shane Connelly Hiroki Sayama Gregory A Ruark

Comput Intell Neurosci

U.S. Army Research Institute for the Behavioral and Social Sciences, Fort Belvoir, VA, USA.

Published: October 2021

Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called -vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8487827	PMC
http://dx.doi.org/10.1155/2021/6151651	DOI Listing

Publication Analysis

Top Keywords

audio signals

utterance clustering

audio

stereo audio

processed audio

gaussian mixture

mixture model

signals

utterance

clustering stereo

Similar Publications

Effects of spectral manipulations of music mixes on musical scene analysis abilities of hearing-impaired listeners.

PLoS One

January 2025

Dept. of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany.

Aravindan Joseph Benjamin Kai Siedenburg

Music pre-processing methods are currently becoming a recognized area of research with the goal of making music more accessible to listeners with a hearing impairment. Our previous study showed that hearing-impaired listeners preferred spectrally manipulated multi-track mixes. Nevertheless, the acoustical basis of mixing for hearing-impaired listeners remains poorly understood.

View Article and Find Full Text PDF

Similar Publications

Resistive memory-based zero-shot liquid state machine for multimodal event data learning.

Nat Comput Sci

January 2025

Key Lab of Fabrication Technologies for Integrated Circuits and Key Laboratory of Microelectronic Devices and Integrated Technology, Institute of Microelectronics of the Chinese Academy of Sciences, Beijing, China.

Ning Lin Shaocong Wang Yi Li Bo Wang Shuhui Shi

The human brain is a complex spiking neural network (SNN) capable of learning multimodal signals in a zero-shot manner by generalizing existing knowledge. Remarkably, it maintains minimal power consumption through event-based signal propagation. However, replicating the human brain in neuromorphic hardware presents both hardware and software challenges.

View Article and Find Full Text PDF

Similar Publications

An Electroglottographic and Acoustic Study on Mandarin Speech in Male Heroin Users.

J Voice

January 2025

Department of Audio, Video, and Electronic Forensics, Academy of Forensic Science, Shanghai, China; Shanghai Forensic Service Platform, Key Laboratory of Forensic Science, Ministry of Justice, Shanghai, China.

Puyang Geng Ningxue Fan Rong Ling Zhijun Li Hong Guo

Drug abuse can cause severe damage to the human speech organs. The vocal folds are one of the important speech organs that produce voice through vibration when airflow passes through. Previous studies have reported the negative effects of drugs on speech organs, including the vocal folds, but there is still limited research on relevant field.

View Article and Find Full Text PDF

Similar Publications

Analysis and Categorization of Various Types of Vocal Distortion in Rock, Metal, Pop Styles, and Throat Singing Observed by High-Speed Digital Imaging.

J Voice

January 2025

Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan.

Yogaku Lee Masato Tanaka Hikari Kato Takashi Nakagawa Satoshi Ishikawa

Introduction: Vocal distortion, also known as a scream or growl, is used worldwide as an essential technique in singing, especially in rock and metal, and as an ethnic voice in Mongolian singing. However, the production mechanism of vocal distortion is not yet clearly understood owing to limited research on the behavior of the larynx, which is the source of the distorted voice.

Objectives: This study used high-speed digital imaging (HSDI) to observe the larynx of professional singers with exceptional singing skills and determine the laryngeal dynamics in the voice production of various vocal distortions.

View Article and Find Full Text PDF

Similar Publications

Rehabilitation of human hearing with a totally implantable cochlear implant: a feasibility study.

Commun Med (Lond)

January 2025

MED-EL Elektromedizinische Geräte GmbH, Fürstenweg 77a, 6020, Innsbruck, Austria.

Philippe Pierre Lefebvre Joachim Müller Gerhard Mark Florian Schwarze Ingeborg Hochmair

Background: Cochlear implants (CIs) are neuroprosthetic devices which restore hearing in severe-to-profound hearing loss through electrical stimulation of the auditory nerve. Current CIs use an externally worn audio processor. A long-term goal in the field has been to develop a device in which all components are contained within a single implant.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!