The goal of this study is to investigate advanced signal processing approaches [single frequency filtering (SFF) and zero-time windowing (ZTW)] with modern deep neural networks (DNNs) [convolution neural networks (CNNs), temporal convolution neural networks (TCN), time-delay neural network (TDNN), and emphasized channel attention, propagation and aggregation in TDNN (ECAPA-TDNN)] for dialect classification of major dialects of English. Previous studies indicated that SFF and ZTW methods provide higher spectro-temporal resolution. To capture the intrinsic variations in articulations among dialects, four feature representations [spectrogram (SPEC), cepstral coefficients, mel filter-bank energies, and mel-frequency cepstral coefficients (MFCCs)] are derived from SFF and ZTW methods.
View Article and Find Full Text PDFBackground: Machine Learning (ML) represents a rapidly growing technology that supplies the most effective solutions for solving complex problems. The application of ML techniques in healthcare is gaining more attention because of ML-associated automatic pattern identification mechanisms. Diabetes is characterized by hyperglycemia resulting from improper insulin secretion and/or insulin utilization.
View Article and Find Full Text PDFSpeech produced by a speaker in emotionally charged situations, such as anger, happiness, and shout corresponds to high arousal speech. Changes in the production characteristics such as increase in the subglottal air pressure, increase in the glottal closed phase in each cycle, and increase in the rate of glottal vibration are observed in the high arousal speech. Acoustic parameters such as glottal closed quotient and fundamental frequency (F) are used to characterize the high arousal speech.
View Article and Find Full Text PDF