This paper describes open source software (available at https://github.com/robotology/natural-speech) to build automatic speech recognition (ASR) systems and run them within the YARP platform. The toolkit is designed (i) to allow non-ASR experts to easily create their own ASR system and run it on iCub and (ii) to build deep learning-based models specifically addressing the main challenges an ASR system faces in the context of verbal human-iCub interactions. The toolkit mostly consists of Python, C++ code and shell scripts integrated in YARP. As additional contribution, a second codebase (written in Matlab) is provided for more expert ASR users who want to experiment with bio-inspired and developmental learning-inspired ASR systems. Specifically, we provide code for two distinct kinds of speech recognition: "articulatory" and "unsupervised" speech recognition. The first is largely inspired by influential neurobiological theories of speech perception which assume speech perception to be mediated by brain motor cortex activities. Our articulatory systems have been shown to outperform strong deep learning-based baselines. The second type of recognition systems, the "unsupervised" systems, do not use any supervised information (contrary to most ASR systems, including our articulatory systems). To some extent, they mimic an infant who has to discover the basic speech units of a language by herself. In addition, we provide resources consisting of pre-trained deep learning models for ASR, and a 2.5-h speech dataset of spoken commands, the VoCub dataset, which can be used to adapt an ASR system to the typical acoustic environments in which iCub operates.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805979PMC
http://dx.doi.org/10.3389/frobt.2018.00010DOI Listing

Publication Analysis

Top Keywords

speech recognition
16
asr systems
12
asr system
12
speech
8
asr
8
deep learning-based
8
speech perception
8
articulatory systems
8
systems
7
recognition icub
4

Similar Publications

Objectives: Hearing impairment during childhood is a widespread health issue. Prompt recognition and timely intervention are vital for the advancement of language skills. Insufficient parental knowledge can lead to a delay in diagnosing and treating a condition, which can have a negative impact on academic performance.

View Article and Find Full Text PDF

 Minimally invasive Ponto surgery (MIPS) enables the installation of percutaneous bone-anchored hearing implants (BAHIs) with a drill guide through a hole punch incision. Despite being well established for adults, there is a lack of studies in the literature regarding its use in pediatric patients.  The aim of the present study was to investigate the hearing performance and soft-tissue outcomes of the use of MIPS under local anesthesia in children with unilateral craniofacial malformation (UCM).

View Article and Find Full Text PDF

Objectives: Bimodal cochlear implant (CI) users vary in speech recognition outcomes. This variability may be influenced partly by the CI and contralateral hearing aid (HA) programming procedures, which can result in mismatches in latency and frequency. We assessed the performance of bimodal listeners when latency mismatches were corrected and analyzed how frequency mismatches influenced outcomes.

View Article and Find Full Text PDF

Automatic speech recognition predicts contemporaneous earthquake fault displacement.

Nat Commun

January 2025

Los Alamos National Laboratory, EES-17 National Security Earth Science, Los Alamos, NM, 87545, USA.

Significant progress has been made in probing the state of an earthquake fault by applying machine learning to continuous seismic waveforms. The breakthroughs were originally obtained from laboratory shear experiments and numerical simulations of fault shear, then successfully extended to slow-slipping faults. Here we apply the Wav2Vec-2.

View Article and Find Full Text PDF

Objectives: An improvement in speech perception is a major well-documented benefit of cochlear implantation (CI), which is commonly discussed with CI candidates to set expectations. However, a large variability exists in speech perception outcomes. We evaluated the accuracy of clinical predictions of post-CI speech perception scores.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!