The use of face masks has increased dramatically since the COVID-19 pandemic started in order to to curb the spread of the disease. Additionally, breakthrough infections caused by the Delta and Omicron variants have further increased the importance of wearing a face mask, even for vaccinated individuals. However, the use of face masks also induces attenuation in speech signals, and this change may impact speech processing technologies, e.g., automated speaker verification (ASV) and speech to text conversion. In this paper we examine Automatic Speaker Verification (ASV) systems against the speech samples in the presence of three different types of face mask: surgical, cloth, and filtered N95, and analyze the impact on acoustics and other factors. In addition, we explore the effect of different microphones, and distance from the microphone, and the impact of face masks when speakers use ASV systems in real-world scenarios. Our analysis shows a significant deterioration in performance when an ASV system encounters different face masks, microphones, and variable distance between the subject and microphone. To address this problem, this paper proposes a novel framework to overcome performance degradation in these scenarios by realigning the ASV system. The novelty of the proposed ASV framework is as follows: first, we propose a fused feature descriptor by concatenating the novel Ternary Deviated overlapping Patterns (TDoP), Mel Frequency Cepstral Coefficients (MFCC), and Gammatone Cepstral Coefficients (GTCC), which are used by both the ensemble learning-based ASV and anomaly detection system in the proposed ASV architecture. Second, this paper proposes an anomaly detection model for identifying vocal samples produced in the presence of face masks. Next, it presents a Peak Norm (PN) filter to approximate the signal of the speaker without a face mask in order to boost the accuracy of ASV systems. Finally, the features of filtered samples utilizing the PN filter and samples without face masks are passed to the proposed ASV to test for improved accuracy. The proposed ASV system achieved an accuracy of 0.99 and 0.92, respectively, on samples recorded without a face mask and with different face masks. Although the use of face masks affects the ASV system, the PN filtering solution overcomes this deficiency up to 4%. Similarly, when exposed to different microphones and distances, the PN approach enhanced system accuracy by up to 7% and 9%, respectively. The results demonstrate the effectiveness of the presented framework against an in-house prepared, diverse Multi Speaker Face Masks (MSFM) dataset, (IRB No. FY2021-83), consisting of samples of subjects taken with a variety of face masks and microphones, and from different distances.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003118PMC
http://dx.doi.org/10.3390/s22072638DOI Listing

Publication Analysis

Top Keywords

face masks
40
face mask
16
asv system
16
proposed asv
16
face
14
speaker verification
12
asv
12
asv systems
12
masks
10
automatic speaker
8

Similar Publications

Sterilization and Filter Performance of Nano- and Microfibrous Facemask Filters - Electrospinning and Restoration of Charges for Competitive Sustainable Alternatives.

Macromol Rapid Commun

December 2024

Empa, Swiss Federal Laboratories for Materials Science and Technology, Laboratory for Biomimetic Membranes and Textiles, St. Gallen, 9014, Switzerland.

Facemask materials have been under constant development to optimize filtration performance, wear comfort, and general resilience to chemical and mechanical stress. While single-use polypropylene meltblown membranes are the established go-to material for high-performing mask filters, they are neither sustainable nor particularly resistant to sterilization methods. Herein an in-depth analysis is provided of the sterilization efficiency, filtration efficiency, and breathing resistance of selected aerosol filters commonly implemented in facemasks, with a particular focus on the benefits of nanofibrous filters.

View Article and Find Full Text PDF

Evaluating Medical Image Segmentation Models Using Augmentation.

Tomography

December 2024

Clinic for Radiology and Nuclear Medicine, University Hospital, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany.

Background: Medical imagesegmentation is an essential step in both clinical and research applications, and automated segmentation models-such as TotalSegmentator-have become ubiquitous. However, robust methods for validating the accuracy of these models remain limited, and manual inspection is often necessary before the segmentation masks produced by these models can be used.

Methods: To address this gap, we have developed a novel validation framework for segmentation models, leveraging data augmentation to assess model consistency.

View Article and Find Full Text PDF

Towards Robust Supervised Pectoral Muscle Segmentation in Mammography Images.

J Imaging

December 2024

Computer Science and Engineering Department, College of Engineering, University of Nevada, Reno, Main Campus, Reno, NV 89557, USA.

Mammography images are the most commonly used tool for breast cancer screening. The presence of pectoral muscle in images for the mediolateral oblique view makes designing a robust automated breast cancer detection system more challenging. Most of the current methods for removing the pectoral muscle are based on traditional machine learning approaches.

View Article and Find Full Text PDF

This study introduced a novel approach to 3D image segmentation utilizing a neural network framework applied to 2D depth map imagery, with Z axis values visualized through color gradation. This research involved comprehensive data collection from mechanically harvested wild blueberries to populate 3D and red-green-blue (RGB) images of filled totes through time-of-flight and RGB cameras, respectively. Advanced neural network models from the YOLOv8 and Detectron2 frameworks were assessed for their segmentation capabilities.

View Article and Find Full Text PDF

IngredSAM: Open-World Food Ingredient Segmentation via a Single Image Prompt.

J Imaging

November 2024

Architecture and Design College, Nanchang University, No. 999, Xuefu Avenue, Honggutan New District, Nanchang 330031, China.

Food semantic segmentation is of great significance in the field of computer vision and artificial intelligence, especially in the application of food image analysis. Due to the complexity and variety of food, it is difficult to effectively handle this task using supervised methods. Thus, we introduce IngredSAM, a novel approach for open-world food ingredient semantic segmentation, extending the capabilities of the Segment Anything Model (SAM).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!