Packet loss concealment (PLC) aims to mitigate speech impairments caused by packet losses so as to improve speech perceptual quality. This paper proposes an end-to-end PLC algorithm with a time-frequency hybrid generative adversarial network, which incorporates a dilated residual convolution and the integration of a time-domain discriminator and frequency-domain discriminator into a convolutional encoder-decoder architecture. The dilated residual convolution is employed to aggregate the short-term and long-term context information of lost speech frames through two network receptive fields with different dilation rates, and the integrated time-frequency discriminators are proposed to learn multi-resolution time-frequency features from correctly received speech frames with both time-domain waveform and frequency-domain complex spectrums.
View Article and Find Full Text PDFTraditional stereophonic acoustic echo cancellation algorithms need to estimate acoustic echo paths from stereo loudspeakers to a microphone, which often suffers from the nonuniqueness problem caused by a high correlation between the two far-end signals of these stereo loudspeakers. Many decorrelation methods have already been proposed to mitigate this problem. However, these methods may reduce the audio quality and/or stereophonic spatial perception.
View Article and Find Full Text PDFPrevious studies have shown the importance of introducing power compression on both feature and target when only the magnitude is considered in the dereverberation task. When both real and imaginary components are estimated without power compression, it has been shown that it is important to take magnitude constraint into account. In this paper, both power compression and phase estimation are considered to show their equal importance in the dereverberation task, where we propose to reconstruct the compressed real and imaginary components (cRI) for training.
View Article and Find Full Text PDF