Publications by authors named "Lazaros Vrysis"

Speaker diarization consists of answering the question of "who spoke when" in audio recordings. In meeting scenarios, the task of labeling audio with the corresponding speaker identities can be further assisted by the exploitation of spatial features. This work proposes a framework designed to assess the effectiveness of combining speaker embeddings with Time Difference of Arrival (TDOA) values from available microphone sensor arrays in meetings.

View Article and Find Full Text PDF

This study presents a novel audio compression technique, tailored for environmental monitoring within multi-modal data processing pipelines. Considering the crucial role that audio data play in environmental evaluations, particularly in contexts with extreme resource limitations, our strategy substantially decreases bit rates to facilitate efficient data transfer and storage. This is accomplished without undermining the accuracy necessary for trustworthy air pollution analysis while simultaneously minimizing processing expenses.

View Article and Find Full Text PDF

Social media platforms have led to the creation of a vast amount of information produced by users and published publicly, facilitating participation in the public sphere, but also giving the opportunity for certain users to publish hateful content. This content mainly involves offensive/discriminative speech towards social groups or individuals (based on racial, religious, gender or other characteristics) and could possibly lead into subsequent hate actions/crimes due to persistent escalation. Content management and moderation in big data volumes can no longer be supported manually.

View Article and Find Full Text PDF

In this paper, an audio-driven, multimodal approach for speaker diarization in multimedia content is introduced and evaluated. The proposed algorithm is based on semi-supervised clustering of audio-visual embeddings, generated using deep learning techniques. The two modes, audio and video, are separately addressed; a long short-term memory Siamese neural network is employed to produce embeddings from audio, whereas a pre-trained convolutional neural network is deployed to generate embeddings from two-dimensional blocks representing the faces of speakers detected in video frames.

View Article and Find Full Text PDF

To develop and evaluate a software application capable of conducting Pure-Tone Audiometry tests in clinical practice. We designed and developed a mobile software application for iPad devices that performs Pure-Tone Audiometry according to ANSI and IEC standards. The application is proposed to be operated by a trained audiologist inside a sound booth.

View Article and Find Full Text PDF