A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

Combination of deep speaker embeddings for diarisation. | LitMetric

Combination of deep speaker embeddings for diarisation.

Neural Netw

Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ, UK. Electronic address:

Published: September 2021

Significant progress has recently been made in speaker diarisation after the introduction of d-vectors as speaker embeddings extracted from neural network (NN) speaker classifiers for clustering speech segments. To extract better-performing and more robust speaker embeddings, this paper proposes a c-vector method by combining multiple sets of complementary d-vectors derived from systems with different NN components. Three structures are used to implement the c-vectors, namely 2D self-attentive, gated additive, and bilinear pooling structures, relying on attention mechanisms, a gating mechanism, and a low-rank bilinear pooling mechanism respectively. Furthermore, a neural-based single-pass speaker diarisation pipeline is also proposed in this paper, which uses NNs to achieve voice activity detection, speaker change point detection, and speaker embedding extraction. Experiments and detailed analyses are conducted on the challenging AMI and NIST RT05 datasets which consist of real meetings with 4-10 speakers and a wide range of acoustic conditions. For systems trained on the AMI training set, relative speaker error rate (SER) reductions of 13% and 29% are obtained by using c-vectors instead of d-vectors on the AMI dev and eval sets respectively, and a relative SER reduction of 15% in SER is observed on RT05, which shows the robustness of the proposed methods. By incorporating VoxCeleb data into the training set, the best c-vector system achieved 7%, 17% and 16% relative SER reduction compared to the d-vector on the AMI dev, eval and RT05 sets respectively.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2021.04.020DOI Listing

Publication Analysis

Top Keywords

speaker embeddings
12
speaker
9
speaker diarisation
8
bilinear pooling
8
detection speaker
8
training set
8
ami dev
8
dev eval
8
relative ser
8
ser reduction
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!