EURASIP J Audio Speech Music Process
November 2024
In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analysed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method.
View Article and Find Full Text PDFEURASIP J Audio Speech Music Process
October 2024
EURASIP J Audio Speech Music Process
September 2024
Unlabelled: Room impulse responses (RIRs) are used in several applications, such as augmented reality and virtual reality. These applications require a large number of RIRs to be convolved with audio, under strict latency constraints. In this paper, we consider the compression of RIRs, in conjunction with fast time-domain convolution.
View Article and Find Full Text PDFSound zone methods aim to control the sound field produced by an array of loudspeakers to render a given audio content in specific areas while making it almost inaudible in others. At low frequencies, control filters are based on information of the electro-acoustical path between loudspeakers and listening areas, contained in the room impulse responses (RIRs). This information can be acquired wirelessly through ubiquitous networks of microphones.
View Article and Find Full Text PDFIn the development of acoustic signal processing algorithms, their evaluation in various acoustic environments is of utmost importance. In order to advance evaluation in realistic and reproducible scenarios, several high-quality acoustic databases have been developed over the years. In this paper, we present another complementary database of acoustic recordings, referred to as the Multi-arraY Room Acoustic Database (MYRiAD).
View Article and Find Full Text PDFFront Neuroinform
November 2022
Recent deep neural network based methods provide accurate binaural source localization performance. These data-driven models map measured binaural cues directly to source locations hence their performance highly depend on the training data distribution. In this paper, we propose a parametric embedding that maps the binaural cues to a low-dimensional space where localization can be done with a nearest-neighbor regression.
View Article and Find Full Text PDFThis paper proposes an experimental setup for measuring the sound radiation of a quadrotor drone using a hemispherical microphone array. The measured sound field is decomposed into spherical harmonics, which enables the evaluation of the radiation pattern to non-probed positions. Additionally, the measurement setup allows the assessment of noise emission and psychoacoustic metrics at a wide range of angles.
View Article and Find Full Text PDFBackground: Emotions and mood are important for overall well-being. Therefore, the search for continuous, effortless emotion prediction methods is an important field of study. Mobile sensing provides a promising tool and can capture one of the most telling signs of emotion: language.
View Article and Find Full Text PDFEURASIP J Audio Speech Music Process
July 2021
If music is the language of the universe, musical note onsets may be the syllables for this language. Not only do note onsets define the temporal pattern of a musical piece, but their time-frequency characteristics also contain rich information about the identity of the musical instrument producing the notes. Note onset detection (NOD) is the basic component for many music information retrieval tasks and has attracted significant interest in audio signal processing research.
View Article and Find Full Text PDFEURASIP J Audio Speech Music Process
May 2021
Amongst the various characteristics of a speech signal, the expression of emotion is one of the characteristics that exhibits the slowest temporal dynamics. Hence, a performant speech emotion recognition (SER) system requires a predictive model that is capable of learning sufficiently long temporal dependencies in the analysed speech signal. Therefore, in this work, we propose a novel end-to-end neural network architecture based on the concept of dilated causal convolution with context stacking.
View Article and Find Full Text PDFOne of the current gaps in teleaudiology is the lack of methods for adult hearing screening viable for use in individuals of unknown language and in varying environments. We have developed a novel automated speech-in-noise test that uses stimuli viable for use in non-native listeners. The test reliability has been demonstrated in laboratory settings and in uncontrolled environmental noise settings in previous studies.
View Article and Find Full Text PDFPurpose The aim of this study was to develop and evaluate a novel, automated speech-in-noise test viable for widespread in situ and remote screening. Method Vowel-consonant-vowel sounds in a multiple-choice consonant discrimination task were used. Recordings from a professional male native English speaker were used.
View Article and Find Full Text PDFAn experiment was conducted to identify the perceptual effects of acoustical properties of domestic listening environments, in a stereophonic reproduction scenario. Nine sound fields, originating from four rooms, were captured and spatially reproduced over a three-dimensional loudspeaker array. A panel of ten expert assessors identified and quantified the perceived differences of those sound fields using their own perceptual attributes.
View Article and Find Full Text PDFAn experiment was conducted to determine the perceptual effects of car cabin acoustics on the reproduced sound field. In-car measurements were conducted whilst the cabin's interior was physically modified. The captured sound fields were recreated in the laboratory using a three-dimensional loudspeaker array.
View Article and Find Full Text PDFSubjective audio quality evaluation experiments have been conducted to assess the performance of embedded-optimization-based precompensation algorithms for mitigating perceptible linear and nonlinear distortion in audio signals. It is concluded with statistical significance that the perceived audio quality is improved by applying an embedded-optimization-based precompensation algorithm, both in case (i) nonlinear distortion and (ii) a combination of linear and nonlinear distortion is present. Moreover, a significant positive correlation is reported between the collected subjective and objective PEAQ audio quality scores, supporting the validity of using PEAQ to predict the impact of linear and nonlinear distortion on the perceived audio quality.
View Article and Find Full Text PDF