HiddenSinger: High-quality singing voice synthesis via neural audio codec and latent diffusion models.

Neural Netw

Department of Artificial Intelligence, Korea University, 02841, Seoul, Republic of Korea. Electronic address:

Published: January 2025

Recently, denoising diffusion models have demonstrated remarkable performance among generative models in various domains. However, in the speech domain, there are limitations in complexity and controllability to apply diffusion models for time-varying audio synthesis. Particularly, a singing voice synthesis (SVS) task, which has begun to emerge as a practical application in the game and entertainment industries, requires high-dimensional samples with long-term acoustic features. To alleviate the challenges posed by model complexity in the SVS task, we propose HiddenSinger, a high-quality SVS system using a neural audio codec and latent diffusion models. To ensure high-fidelity audio, we introduce an audio autoencoder that can encode audio into an audio codec as a compressed representation and reconstruct the high-fidelity audio from the low-dimensional compressed latent vector. Subsequently, we use the latent diffusion models to sample a latent representation from a musical score. In addition, our proposed model is extended to an unsupervised singing voice learning framework, HiddenSinger-U, to train the model using an unlabeled singing voice dataset. Experimental results demonstrate that our model outperforms previous models regarding audio quality. Furthermore, the HiddenSinger-U can synthesize high-quality singing voices of speakers trained solely on unlabeled data.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2024.106762DOI Listing

Publication Analysis

Top Keywords

diffusion models
20
singing voice
16
audio codec
12
latent diffusion
12
audio
9
hiddensinger high-quality
8
high-quality singing
8
voice synthesis
8
neural audio
8
codec latent
8

Similar Publications

Precision medicine in less-defined subtype diffuse large B-cell lymphoma (DLBCL) remains a challenge due to the heterogeneous nature of the disease. Programmed cell death (PCD) pathways are crucial in the advancement of lymphoma and serve as significant prognostic markers for individuals afflicted with lymphoid cancers. To identify robust prognostic biomarkers that can guide personalized management for less-defined subtype DLBCL patients, we integrated multi-omics data derived from 339 standard R-CHOP-treated patients diagnosed with less-defined subtype DLBCL from three independent cohorts.

View Article and Find Full Text PDF

Purpose: This study investigates the capabilities of ultrasonography (US) in determing the stage of orbital inflammation in patients with granulomatosis with polyangiitis (GPA).

Material And Methods: The study included 24 patients (8 men and 16 women) with diffuse orbital tissue involvement in GPA. Group 1 (active stage) included nine patients, while group 2 (inactive stage) consisted of 18 patients.

View Article and Find Full Text PDF

Similar pipeline experiment and disaster control emergency plan of updraft airflow fire in mine.

Sci Rep

December 2024

College of Safety Science and Engineering, Liaoning Technical University, 47 Zhonghua Road, Xihe District, Fuxin City, 123000, Liaoning Province, China.

Based on the engineering example of Linsheng coal mine, this paper uses TF1M3D computer simulation platform to systematically analyze the process of smoke flow spreading and air flow disorder disaster from the perspective of the whole mine network, and puts forward corresponding plans and measures. A small scale similar experiment was carried out to study the updraft flow fire in the mine. Through the analysis of the collected experimental data, the variation law of the air volume of the fire source in the main air path, side branch road and total air path with different air volume and the variation characteristics of the temperature at the monitoring point with time were obtained under different air volume conditions, and the critical air volume was fitted as 1.

View Article and Find Full Text PDF

We used machine learning to investigate the residual visual field (VF) deficits and macula retinal ganglion cell (RGC) thickness loss patterns in recovered optic neuritis (ON). We applied archetypal analysis (AA) to 377 same-day pairings of 10-2 VF and optical coherence tomography (OCT) macula images from 93 ON eyes and 70 normal fellow eyes ≥ 90 days after acute ON. We correlated archetype (AT) weights (total weight = 100%) of VFs and total retinal thickness (TRT), inner retinal thickness (IRT), and macular ganglion cell-inner plexiform layer (GCIPL) thickness.

View Article and Find Full Text PDF

This study optimizes V and ΔV in amorphous indium-gallium-zinc-oxide (a-IGZO) field-effect transistors (FETs) by examining the influence of both channel length (L) and Ga composition. It was observed that as the ratio of In: Ga: Zn changed from 1:1:1 to 0.307:0.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!