Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called -vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8487827PMC
http://dx.doi.org/10.1155/2021/6151651DOI Listing

Publication Analysis

Top Keywords

audio signals
20
utterance clustering
16
audio
8
stereo audio
8
processed audio
8
gaussian mixture
8
mixture model
8
signals
6
utterance
4
clustering stereo
4

Similar Publications

Music pre-processing methods are currently becoming a recognized area of research with the goal of making music more accessible to listeners with a hearing impairment. Our previous study showed that hearing-impaired listeners preferred spectrally manipulated multi-track mixes. Nevertheless, the acoustical basis of mixing for hearing-impaired listeners remains poorly understood.

View Article and Find Full Text PDF

Resistive memory-based zero-shot liquid state machine for multimodal event data learning.

Nat Comput Sci

January 2025

Key Lab of Fabrication Technologies for Integrated Circuits and Key Laboratory of Microelectronic Devices and Integrated Technology, Institute of Microelectronics of the Chinese Academy of Sciences, Beijing, China.

The human brain is a complex spiking neural network (SNN) capable of learning multimodal signals in a zero-shot manner by generalizing existing knowledge. Remarkably, it maintains minimal power consumption through event-based signal propagation. However, replicating the human brain in neuromorphic hardware presents both hardware and software challenges.

View Article and Find Full Text PDF

An Electroglottographic and Acoustic Study on Mandarin Speech in Male Heroin Users.

J Voice

January 2025

Department of Audio, Video, and Electronic Forensics, Academy of Forensic Science, Shanghai, China; Shanghai Forensic Service Platform, Key Laboratory of Forensic Science, Ministry of Justice, Shanghai, China.

Drug abuse can cause severe damage to the human speech organs. The vocal folds are one of the important speech organs that produce voice through vibration when airflow passes through. Previous studies have reported the negative effects of drugs on speech organs, including the vocal folds, but there is still limited research on relevant field.

View Article and Find Full Text PDF

Introduction: Vocal distortion, also known as a scream or growl, is used worldwide as an essential technique in singing, especially in rock and metal, and as an ethnic voice in Mongolian singing. However, the production mechanism of vocal distortion is not yet clearly understood owing to limited research on the behavior of the larynx, which is the source of the distorted voice.

Objectives: This study used high-speed digital imaging (HSDI) to observe the larynx of professional singers with exceptional singing skills and determine the laryngeal dynamics in the voice production of various vocal distortions.

View Article and Find Full Text PDF

Background: Cochlear implants (CIs) are neuroprosthetic devices which restore hearing in severe-to-profound hearing loss through electrical stimulation of the auditory nerve. Current CIs use an externally worn audio processor. A long-term goal in the field has been to develop a device in which all components are contained within a single implant.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!