Sensors (Basel)
April 2024
This paper addresses a joint training approach applied to a pipeline comprising speech enhancement (SE) and automatic speech recognition (ASR) models, where an acoustic tokenizer is included in the pipeline to leverage the linguistic information from the ASR model to the SE model. The acoustic tokenizer takes the outputs of the ASR encoder and provides a pseudo-label through K-means clustering. To transfer the linguistic information, represented by pseudo-labels, from the acoustic tokenizer to the SE model, a cluster-based pairwise contrastive (CBPC) loss function is proposed, which is a self-supervised contrastive loss function, and combined with an information noise contrastive estimation (infoNCE) loss function.
View Article and Find Full Text PDFSensors (Basel)
August 2023
This paper proposes an Informer-based temperature prediction model to leverage data from an automatic weather station (AWS) and a local data assimilation and prediction system (LDAPS), where the Informer as a variant of a Transformer was developed to better deal with time series data. Recently, deep-learning-based temperature prediction models have been proposed, demonstrating successful performances, such as conventional neural network (CNN)-based models, bi-directional long short-term memory (BLSTM)-based models, and a combination of both neural networks, CNN-BLSTM. However, these models have encountered issues due to the lack of time data integration during the training phase, which also lead to the persistence of a long-term dependency problem in the LSTM models.
View Article and Find Full Text PDFIn this paper, we propose an end-to-end (E2E) neural network model to detect autism spectrum disorder (ASD) from children's voices without explicitly extracting the deterministic features. In order to obtain the decisions for discriminating between the voices of children with ASD and those with typical development (TD), we combined two different feature-extraction models and a bidirectional long short-term memory (BLSTM)-based classifier to obtain the ASD/TD classification in the form of probability. We realized one of the feature extractors as the bottleneck feature from an autoencoder using the extended version of the Geneva minimalistic acoustic parameter set (eGeMAPS) input.
View Article and Find Full Text PDFIn this paper, a new two-step joint optimization approach based on the asynchronous subregion optimization method is proposed for training a pipeline model composed of two different models. The first-step processing of the proposed joint optimization approach trains the front-end model only, and the second-step processing trains all the parameters of the combined model together. In the asynchronous subregion optimization method, the first-step processing only supports the goal of the front-end model.
View Article and Find Full Text PDFIn this paper, we propose a new compression method using underwater acoustic sensor signals for underwater surveillance. Generally, sonar applications that are used for surveillance or ocean monitoring are composed of many underwater acoustic sensors to detect significant sources of sound. It is necessary to apply compression methods to the acquired sensor signals due to data processing and storage resource limitations.
View Article and Find Full Text PDFThe spread of antibiotic-resistant strains of , a gram-positive opportunistic pathogen, has increased due to the frequent use of antibiotics. Inhibition of the quorum-sensing systems of biofilm-producing strains using plant extracts represents an efficient approach for controlling infections. is a medicinal herb showing various bioactivities; however, no studies have reported the anti-biofilm effects of extracts against drug-resistant .
View Article and Find Full Text PDFWeather is affected by a complex interplay of factors, including topography, location, and time. For the prediction of temperature in Korea, it is necessary to use data from multiple regions. To this end, we investigate the use of deep neural-network-based temperature prediction model time-series weather data obtained from an automatic weather station and image data from a regional data assimilation and prediction system (RDAPS).
View Article and Find Full Text PDFAutism spectrum disorder (ASD) is a developmental disorder with a life-span disability. While diagnostic instruments have been developed and qualified based on the accuracy of the discrimination of children with ASD from typical development (TD) children, the stability of such procedures can be disrupted by limitations pertaining to time expenses and the subjectivity of clinicians. Consequently, automated diagnostic methods have been developed for acquiring objective measures of autism, and in various fields of research, vocal characteristics have not only been reported as distinctive characteristics by clinicians, but have also shown promising performance in several studies utilizing deep learning models based on the automated discrimination of children with ASD from children with TD.
View Article and Find Full Text PDFThis paper proposes a sound event detection (SED) method in tunnels to prevent further uncontrollable accidents. Tunnel accidents are accompanied by crashes and tire skids, which usually produce abnormal sounds. Since the tunnel environment always has a severe level of noise, the detection accuracy can be greatly reduced in the existing methods.
View Article and Find Full Text PDFAn adaptive redundant speech transmission (ARST) approach to improve the perceived speech quality (PSQ) of speech streaming applications over wireless multimedia sensor networks (WMSNs) is proposed in this paper. The proposed approach estimates the PSQ as well as the packet loss rate (PLR) from the received speech data. Subsequently, it decides whether the transmission of redundant speech data (RSD) is required in order to assist a speech decoder to reconstruct lost speech signals for high PLRs.
View Article and Find Full Text PDFIn this paper, a packet loss concealment (PLC) algorithm for CELP-type speech coders is proposed in order to improve the quality of decoded speech under burst packet loss conditions in a wireless sensor network. Conventional receiver-based PLC algorithms in the G.729 speech codec are usually based on speech correlation to reconstruct the decoded speech of lost frames by using parameter information obtained from the previous correctly received frames.
View Article and Find Full Text PDF