This paper provides a comprehensive review of the use of computational bioacoustics as well as signal and speech processing techniques in the analysis of primate vocal communication. We explore the potential implications of machine learning and deep learning methods, from the use of simple supervised algorithms to more recent self-supervised models, for processing and analyzing large data sets obtained within the emergence of passive acoustic monitoring approaches. In addition, we discuss the importance of automated primate vocalization analysis in tackling essential questions on animal communication and highlighting the role of comparative linguistics in bioacoustic research.
View Article and Find Full Text PDFThe study of non-human animals' communication systems generally relies on the transcription of vocal sequences using a finite set of discrete units. This set is referred to as a vocal repertoire, which is specific to a species or a sub-group of a species. When conducted by human experts, the formal description of vocal repertoires can be laborious and/or biased.
View Article and Find Full Text PDFWe present an analysis of fin whale (Balaenoptera physalus) songs on passive acoustic recordings from the Pelagos Sanctuary (Western Mediterranean Basin). The recordings were gathered between 2008 and 2018 using 2 different hydrophone stations. We show how 20 Hz fin whale pulses can be automatically detected using a low complexity convolutional neural network (CNN) despite data variability (different recording devices exposed to diverse noises).
View Article and Find Full Text PDFWhen listeners misperceive words in noise, do they report words that are more common? Lexical frequency differences between misperceived and target words in English and Spanish were examined for five masker types. Misperceptions had a higher lexical frequency in the presence of pure energetic maskers, but frequency effects were reduced or absent for informational maskers. The tendency to report more common words increased with the degree of energetic masking, suggesting that uncertainty about segment identity provides a role for lexical frequency.
View Article and Find Full Text PDFThis paper presents a bi-view (front and side) audiovisual Lombard speech corpus, which is freely available for download. It contains 5400 utterances (2700 Lombard and 2700 plain reference utterances), produced by 54 talkers, with each utterance in the dataset following the same sentence format as the audiovisual "Grid" corpus [Cooke, Barker, Cunningham, and Shao (2006). J.
View Article and Find Full Text PDFA better use of the increasing functional capabilities of home automation systems and Internet of Things (IoT) devices to support the needs of users with disability, is the subject of a research project currently conducted by Area Ausili (Assistive Technology Area), a department of Polo Tecnologico Regionale Corte Roncati of the Local Health Trust of Bologna (Italy), in collaboration with AIAS Ausilioteca Assistive Technology (AT) Team. The main aim of the project is to develop experimental low cost systems for environmental control through simplified and accessible user interfaces. Many of the activities are focused on automatic speech recognition and are developed in the framework of the CloudCAST project.
View Article and Find Full Text PDFWords spoken against a noise background often form an ambiguous percept. However, in certain conditions, a listener will mishear a noisy word but report hearing the same incorrect word as reported by other listeners. These consistent hearing errors are valuable as tests of detailed models of speech perception.
View Article and Find Full Text PDF