Acoustic compression in Zoom audio does not compromise voice recognition performance.

Sci Rep

Department of Computational Linguistics, University of Zurich, Andreasstrasse 15, 8050, Zurich, Switzerland.

Published: October 2023

Human voice recognition over telephone channels typically yields lower accuracy when compared to audio recorded in a studio environment with higher quality. Here, we investigated the extent to which audio in video conferencing, subject to various lossy compression mechanisms, affects human voice recognition performance. Voice recognition performance was tested in an old-new recognition task under three audio conditions (telephone, Zoom, studio) across all matched (familiarization and test with same audio condition) and mismatched combinations (familiarization and test with different audio conditions). Participants were familiarized with female voices presented in either studio-quality (N = 22), Zoom-quality (N = 21), or telephone-quality (N = 20) stimuli. Subsequently, all listeners performed an identical voice recognition test containing a balanced stimulus set from all three conditions. Results revealed that voice recognition performance (d') in Zoom audio was not significantly different to studio audio but both in Zoom and studio audio listeners performed significantly better compared to telephone audio. This suggests that signal processing of the speech codec used by Zoom provides equally relevant information in terms of voice recognition compared to studio audio. Interestingly, listeners familiarized with voices via Zoom audio showed a trend towards a better recognition performance in the test (p = 0.056) compared to listeners familiarized with studio audio. We discuss future directions according to which a possible advantage of Zoom audio for voice recognition might be related to some of the speech coding mechanisms used by Zoom.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10618539PMC
http://dx.doi.org/10.1038/s41598-023-45971-xDOI Listing

Publication Analysis

Top Keywords

voice recognition
32
recognition performance
20
zoom audio
16
studio audio
16
audio
14
recognition
10
zoom
8
voice
8
human voice
8
audio conditions
8

Similar Publications

General anesthesia is the gold standard for breast cancer surgeries. Considering the nature of the surgery and its associated pain, various regional techniques are used as an adjunct to general anesthesia. Regional anesthesia as a sole anesthetic technique for breast cancer surgery is an upcoming technique - especially in high-risk patients considering the risk-benefit ratio, various regional blocks like pectoralis major block, pectoralis minor block, and erector spinae block - in which thoracic segmental spinal anesthesia is the recent one.

View Article and Find Full Text PDF

The next step in the evolution of static 3-dimensionally (3D) printed models may be the creation of "smart" models, where subcomponents can be seamlessly interacted with through a feedback mechanism, with potential applications in trainee education and patient counseling. Considering the complexity of the ventricular and cisternal systems, they were chosen for segmentation, using Materialize InPrint with outward hollowing using 2.5-mm wall thickness.

View Article and Find Full Text PDF

This case emphasizes iron deficiency anemia (IDA) as a potential risk factor for pulmonary embolism (PE), especially in patients with type 2 diabetes. Early recognition and management of PE and IDA are crucial. Further research is needed to clarify the mechanisms linking IDA to thrombosis and improve prevention strategies.

View Article and Find Full Text PDF

Polymer-Layered Optical Wearable (PLOW) for Healthcare Applications: Temperature and Stretching Monitoring.

ACS Appl Mater Interfaces

January 2025

Nanophotonics and Plasmonics Laboratory, School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Bhubaneswar, Odisha 752050, India.

Thermal and stretching characteristics are crucial variables in healthcare, robotics, and human-machine interaction applications. Here, we present a single-mode fiber-based, balloon-shaped, single- and dual polymer-layered optical wearable (PLOW) system that can sense both temperature and stretching. These two types of PLOWs are compared in terms of their detection performance across all criteria.

View Article and Find Full Text PDF

An End-To-End Speech Recognition Model for the North Shaanxi Dialect: Design and Evaluation.

Sensors (Basel)

January 2025

SHCCIG Yubei Coal Industry Co., Ltd., Xi'an 710900, China.

The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as "Shapu", characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!