Speech is a commonly used interaction-recognition technique in edutainment-based systems and is a key technology for smooth educational learning and user-system interaction. However, its application to real environments is limited owing to the various noise disruptions in real environments. In this study, an audio and visual information-based multimode interaction system is proposed that enables virtual aquarium systems that use speech to interact to be robust to ambient noise. For audio-based speech recognition, a list of words recognized by a speech API is expressed as word vectors using a pretrained model. Meanwhile, vision-based speech recognition uses a composite end-to-end deep neural network. Subsequently, the vectors derived from the API and vision are classified after concatenation. The signal-to-noise ratio of the proposed system was determined based on data from four types of noise environments. Furthermore, it was tested for accuracy and efficiency against existing single-mode strategies for extracting visual features and audio speech recognition. Its average recognition rate was 91.42% when only speech was used, and improved by 6.7% to 98.12% when audio and visual information were combined. This method can be helpful in various real-world settings where speech recognition is regularly utilized, such as cafés, museums, music halls, and kiosks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9609693 | PMC |
http://dx.doi.org/10.3390/s22207738 | DOI Listing |
J Speech Lang Hear Res
January 2025
Centre for Language Studies, Radboud University, Nijmegen, the Netherlands.
Purpose: In this review article, we present an extensive overview of recent developments in the area of dysarthric speech research. One of the key objectives of speech technology research is to improve the quality of life of its users, as evidenced by the focus of current research trends on creating inclusive conversational interfaces that cater to pathological speech, out of which dysarthric speech is an important example. Applications of speech technology research for dysarthric speech demand a clear understanding of the acoustics of dysarthric speech as well as of speech technologies, including machine learning and deep neural networks for speech processing.
View Article and Find Full Text PDFJ Neurol
January 2025
Department of Circuit Theory, Faculty of Electrical Engineering, Czech Technical University in Prague, Technická 2, Praha 6, 16000, Prague, Czech Republic.
Background And Objectives: Patients with synucleinopathies such as multiple system atrophy (MSA) and Parkinson's disease (PD) frequently display speech and language abnormalities. We explore the diagnostic potential of automated linguistic analysis of natural spontaneous speech to differentiate MSA and PD.
Methods: Spontaneous speech of 39 participants with MSA compared to 39 drug-naive PD and 39 healthy controls matched for age and sex was transcribed and linguistically annotated using automatic speech recognition and natural language processing.
Ophthalmologie
January 2025
Augenklinik Sulzbach, Knappschaftsklinikum Saar, An der Klinik 10, 66280, Sulzbach/Saar, Deutschland.
Background: The increasing bureaucratic burden in everyday clinical practice impairs doctor-patient communication (DPC). Effective use of digital technologies, such as automated semantic speech recognition (ASR) with automated extraction of diagnostically relevant information can provide a solution.
Objective: The aim was to determine the extent to which ASR in conjunction with semantic information extraction for automated documentation of the doctor-patient dialogue (ADAPI) can be integrated into everyday clinical practice using the IVI routine as an example and whether patient care can be improved through process optimization.
Dev Sci
March 2025
MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, Australia.
The classical view is that perceptual attunement to the native language, which emerges by 6-10 months, developmentally precedes phonological feature abstraction abilities. That assumption is challenged by findings from adults adopted into a new language environment at 3-5 months that imply they had already formed phonological feature abstractions about their birth language prior to 6 months. As phonological feature abstraction had not been directly tested in infants, we examined 4-6-month-olds' amodal abstraction of the labial versus coronal place of articulation distinction between consonants.
View Article and Find Full Text PDFSci Rep
January 2025
School of Mathematics and Computer Science, Tongling University, Tongling, 244061, China.
The application of artificial neural networks (ANNs) can be found in numerous fields, including image and speech recognition, natural language processing, and autonomous vehicles. As well, intrusion detection, the subject of this paper, relies heavily on it. Different intrusion detection models have been constructed using ANNs.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!