Partially supervised speaker clustering.

IEEE Trans Pattern Anal Mach Intell

HP Labs, 1501 Page Mill Road, Palo Alto, CA 94304, USA.

Published: May 2012

Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2011.174DOI Listing

Publication Analysis

Top Keywords

speaker clustering
52
distance metric
28
speaker
16
clustering
13
partially supervised
12
supervised speaker
12
gmm supervector
12
cosine distance
12
clustering performance
12
distance
9

Similar Publications

Relationships of eating behaviors with psychopathology, brain maturation and genetic risk for obesity in an adolescent cohort study.

Nat Ment Health

January 2025

Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.

Unhealthy eating, a risk factor for eating disorders (EDs) and obesity, often coexists with emotional and behavioral problems; however, the underlying neurobiological mechanisms are poorly understood. Analyzing data from the longitudinal IMAGEN adolescent cohort, we investigated associations between eating behaviors, genetic predispositions for high body mass index (BMI) using polygenic scores (PGSs), and trajectories (ages 14-23 years) of ED-related psychopathology and brain maturation. Clustering analyses at age 23 years ( = 996) identified 3 eating groups: restrictive, emotional/uncontrolled and healthy eaters.

View Article and Find Full Text PDF

Background: This study assesses the impact of fluconazole resistance on 30-day all-cause mortality and 1-year recurrence in patients with Candida parapsilosis bloodstream infections (BSI).

Methods: A multicenter retrospective study was performed at 3 hospitals in Italy and Spain between 2018 and 2022. Adult patients with positive blood cultures for C.

View Article and Find Full Text PDF

Unlabelled: Subsequent fracture rates and associated mortality were compared before and after the introduction of fracture liaison service (FLS). In 100,198 women and men, FLS was associated with 13% and 10% lower risk of subsequent fragility fractures and 18% and 15% lower mortality. The study suggests that FLS may prevent fractures.

View Article and Find Full Text PDF

Circulating tumor DNA (ctDNA) detection can predict clinical risk in early-stage tumors. However, clinical applications are constrained by the sensitivity of clinically validated ctDNA detection approaches. NeXT Personal is a whole-genome-based, tumor-informed platform that has been analytically validated for ultrasensitive ctDNA detection at 1-3 ppm of ctDNA with 99.

View Article and Find Full Text PDF

Background: Cardiogenic shock (CS) is a heterogeneous clinical syndrome, making it challenging to predict patient trajectory and response to treatment. This study aims to identify biological/molecular CS subphenotypes, evaluate their association with outcome, and explore their impact on heterogeneity of treatment effect (ShockCO-OP, NCT06376318).

Methods: We used unsupervised clustering to integrate plasma biomarker data from two prospective cohorts of CS patients: CardShock (N = 205 [2010-2012, NCT01374867]) and the French and European Outcome reGistry in Intensive Care Units (FROG-ICU) (N = 228 [2011-2013, NCT01367093]) to determine the optimal number of classes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!