Publications by Doroteo T Toledano

Publications by authors named "Doroteo T Toledano"

Page 1 of 1

Feasibility of Big Data Analytics to Assess Personality Based on Voice Analysis.

Víctor J Rubio David Aguado Doroteo T Toledano María Pilar Fernández-Gallego

Sensors (Basel)

November 2024

(1) Background: As far back as the 1930s, it was already thought that gestures, clothing, speech, posture, and gait could express an individual's personality. Different research programs, some focused on linguistic cues, were launched, though results were inconsistent. The development of new speech analysis technology and the generalization of big data analysis have created an opportunity to test the predictive power of voice features on personality dimensions.

View Article and Find Full Text PDF

Analysis and interpretation of joint source separation and sound event detection in domestic environments.

Diego de Benito-Gorrón Katerina Zmolikova Doroteo T Toledano

PLoS One

July 2024

In recent years, the relation between Sound Event Detection (SED) and Source Separation (SSep) has received a growing interest, in particular, with the aim to enhance the performance of SED by leveraging the synergies between both tasks. In this paper, we present a detailed description of JSS (Joint Source Separation and Sound Event Detection), our joint-training scheme for SSep and SED, and we measure its performance in the DCASE Challenge for SED in domestic environments. Our experiments demonstrate that JSS can improve SED performance, in terms of Polyphonic Sound Detection Score (PSDS), even without additional training data.

View Article and Find Full Text PDF

Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT.

Doroteo T Toledano María Pilar Fernández-Gallego Alicia Lozano-Diez

PLoS One

March 2019

Speech Analysis for Automatic Speech Recognition (ASR) systems typically starts with a Short-Time Fourier Transform (STFT) that implies selecting a fixed point in the time-frequency resolution trade-off. This approach, combined with a Mel-frequency scaled filterbank and a Discrete Cosine Transform give rise to the Mel-Frequency Cepstral Coefficients (MFCC), which have been the most common speech features in speech processing for the last decades. These features were particularly well suited for the previous Hidden Markov Models/Gaussian Mixture Models (HMM/GMM) state of the art in ASR.

View Article and Find Full Text PDF

An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition.

Alicia Lozano-Diez Ruben Zazo Doroteo T Toledano Joaquin Gonzalez-Rodriguez

PLoS One

October 2017

Language recognition systems based on bottleneck features have recently become the state-of-the-art in this research field, showing its success in the last Language Recognition Evaluation (LRE 2015) organized by NIST (U.S. National Institute of Standards and Technology).

View Article and Find Full Text PDF

Reviewing the connection between speech and obstructive sleep apnea.

Fernando Espinoza-Cuadros Rubén Fernández-Pozo Doroteo T Toledano José D Alcázar-Ramírez Eduardo López-Gonzalo

Biomed Eng Online

February 2016

Background: Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications.

View Article and Find Full Text PDF

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks.

Ruben Zazo Alicia Lozano-Diez Javier Gonzalez-Dominguez Doroteo T Toledano Joaquin Gonzalez-Rodriguez

PLoS One

July 2016

Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (∼3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible.

View Article and Find Full Text PDF

Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment.

Fernando Espinoza-Cuadros Rubén Fernández-Pozo Doroteo T Toledano José D Alcázar-Ramírez Eduardo López-Gonzalo

Comput Math Methods Med

August 2016

Obstructive sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients' facial images and voice recordings to help in OSA detection and severity assessment.

View Article and Find Full Text PDF