A model synthesizing average frequency components from select sentences in an electromagnetic articulography database has been crafted. This revealed the dual roles of the tongue: its dorsum acts like a carrier wave, and the tip acts as a modulation signal within the articulatory realm. This model illuminates anticipatory coarticulation's subtleties during speech planning.
View Article and Find Full Text PDFStitched images can offer a broader field of view, but their boundaries can be irregular and unpleasant. To address this issue, current methods for rectangling images start by distorting local grids multiple times to obtain rectangular images with regular boundaries. However, these methods can result in content distortion and missing boundary information.
View Article and Find Full Text PDFIntroduction: Speech production involves neurological planning and articulatory execution. How speakers prepare for articulation is a significant aspect of speech production research. Previous studies have focused on isolated words or short phrases to explore speech planning mechanisms linked to articulatory behaviors, including investigating the eye-voice span (EVS) during text reading.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
August 2023
Speech emotion recognition (SER) plays an important role in human-computer interaction, which can provide better interactivity to enhance user experiences. Existing approaches tend to directly apply deep learning networks to distinguish emotions. Among them, the convolutional neural network (CNN) is the most commonly used method to learn emotional representations from spectrograms.
View Article and Find Full Text PDFSentence oral reading requires not only a coordinated effort in the visual, articulatory, and cognitive processes but also supposes a top-down influence from linguistic knowledge onto the visual-motor behavior. Despite a gradual recognition of a predictive coding effect in this process, there is currently a lack of a comprehensive demonstration regarding the time-varying brain dynamics that underlines the oral reading strategy. To address this, our study used a multimodal approach, combining real-time recording of electroencephalography, eye movements, and speech, with a comprehensive examination of regional, inter-regional, sub-network, and whole-brain responses.
View Article and Find Full Text PDFMulti-focus image fusion is a process of fusing multiple images of different focus areas into a total focus image, which has important application value. In view of the defects of the current fusion method in the detail information retention effect of the original image, a fusion architecture based on two stages is designed. In the training phase, combined with the polarized self-attention module and the DenseNet network structure, an encoder-decoder structure network is designed for image reconstruction tasks to enhance the original information retention ability of the model.
View Article and Find Full Text PDFConstructing an efficient human emotion recognition model based on electroencephalogram (EEG) signals is significant for realizing emotional brain-computer interaction and improving machine intelligence.In this paper, we present a spatial-temporal feature fused convolutional graph attention network (STFCGAT) model based on multi-channel EEG signals for human emotion recognition. First, we combined the single-channel differential entropy (DE) feature with the cross-channel functional connectivity (FC) feature to extract both the temporal variation and spatial topological information of EEG.
View Article and Find Full Text PDFIn recent years, electroencephalograph (EEG) studies on speech comprehension have been extended from a controlled paradigm to a natural paradigm. Under the hypothesis that the brain can be approximated as a linear time-invariant system, the neural response to natural speech has been investigated extensively using temporal response functions (TRFs). However, most studies have modeled TRFs in the electrode space, which is a mixture of brain sources and thus cannot fully reveal the functional mechanism underlying speech comprehension.
View Article and Find Full Text PDFOn-board system fault knowledge base (KB) is a collection of fault causes, maintenance methods, and interrelationships among on-board modules and components of high-speed railways, which plays a crucial role in knowledge-driven dynamic operation and maintenance (O&M) decisions for on-board systems. To solve the problem of multi-source heterogeneity of on-board system O&M data, an entity matching (EM) approach using the BERT model and semi-supervised incremental learning is proposed. The heterogeneous knowledge fusion task is formulated as a pairwise binary classification task of entities in the knowledge units.
View Article and Find Full Text PDFBeing able to accurately perceive the emotion expressed by the facial or verbal expression from others is critical to successful social interaction. However, only few studies examined the multimodal interactions on speech emotion, and there is no consistence in studies on the speech emotion perception. It remains unclear, how the speech emotion of different valence is perceived on the multimodal stimuli by our human brain.
View Article and Find Full Text PDFEfficient learning of spikes plays a valuable role in training spiking neural networks (SNNs) to have desired responses to input stimuli. However, current learning rules are limited to a binary form of spikes. The seemingly ubiquitous phenomenon of burst in nervous systems suggests a new way to carry more information with spike bursts in addition to times.
View Article and Find Full Text PDFContinuous dimensional emotion recognition from speech helps robots or virtual agents capture the temporal dynamics of a speaker's emotional state in natural human-robot interactions. Temporal modulation cues obtained directly from the time-domain model of auditory perception can better reflect temporal dynamics than the acoustic features usually processed in the frequency domain. Feature extraction, which can reflect temporal dynamics of emotion from temporal modulation cues, is challenging because of the complexity and diversity of the auditory perception model.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
April 2022
Spiking neural networks (SNNs) are considered as a potential candidate to overcome current challenges, such as the high-power consumption encountered by artificial neural networks (ANNs); however, there is still a gap between them with respect to the recognition accuracy on various tasks. A conversion strategy was, thus, introduced recently to bridge this gap by mapping a trained ANN to an SNN. However, it is still unclear that to what extent this obtained SNN can benefit both the accuracy advantage from ANN and high efficiency from the spike-based paradigm of computation.
View Article and Find Full Text PDFPurpose The primary purpose of this study was to explore the audiovisual speech perception strategies.80.23.
View Article and Find Full Text PDFSpikes are the currency in central nervous systems for information transmission and processing. They are also believed to play an essential role in low-power consumption of the biological systems, whose efficiency attracts increasing attentions to the field of neuromorphic computing. However, efficient processing and learning of discrete spikes still remain a challenging problem.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
February 2021
The capability for environmental sound recognition (ESR) can determine the fitness of individuals in a way to avoid dangers or pursue opportunities when critical sound events occur. It still remains mysterious about the fundamental principles of biological systems that result in such a remarkable ability. Additionally, the practical importance of ESR has attracted an increasing amount of research attention, but the chaotic and nonstationary difficulties continue to make it a challenging task.
View Article and Find Full Text PDFComput Intell Neurosci
May 2020
Aspect-level sentiment classification aims to identify the sentiment polarity of a review expressed toward a target. In recent years, neural network-based methods have achieved success in aspect-level sentiment classification, and these methods fall into two types: the first takes the target information into account for context modelling, and the second models the context without considering the target information. It is concluded that the former is better than the latter.
View Article and Find Full Text PDFUnderstanding brain processing mechanisms from the perception of speech sounds to high-level semantic processing is vital for effective human-robot communication. In this study, 128-channel electroencephalograph (EEG) signals were recorded when subjects were listening to real and pseudowords in Mandarin. By using an EEG source reconstruction method and a sliding-window Granger causality analysis, we analyzed the dynamic brain connectivity patterns.
View Article and Find Full Text PDFIn this paper, a novel imperceptible, fragile and blind watermark scheme is proposed for speech tampering detection and self-recovery. The embedded watermark data for content recovery is calculated from the original discrete cosine transform (DCT) coefficients of host speech. The watermark information is shared in a frames-group instead of stored in one frame.
View Article and Find Full Text PDFOne of the long-standing issues in neurolinguistic research is about the neural basis of word representation, concerning whether grammatical classification or semantic difference causes the neural dissociation of brain activity patterns when processing different word categories, especially nouns and verbs. To disentangle this puzzle, four orthogonalized word categories in Chinese: unambiguous nouns (UN), unambiguous verbs (UV), ambiguous words with noun-biased semantics (AN), and ambiguous words with verb-biased semantics (AV) were adopted in an auditory task for recording electroencephalographic (EEG) signals from 128 electrodes on the scalps of twenty-two subjects. With the advanced current density reconstruction (CDR) algorithm and the constraint of standardized low-resolution electromagnetic tomography, the spatiotemporal brain dynamics of word processing were explored with the results that in multiple time periods including P1 (60-90ms), N1 (100-140ms), P200 (150-250ms) and N400 (350-450ms), noun-verb dissociation over the parietal-occipital and frontal-central cortices appeared not only between the UN-UV grammatical classes but also between the grammatically identical but semantically different AN-AV pairs.
View Article and Find Full Text PDFComput Math Methods Med
March 2017
The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper.
View Article and Find Full Text PDFJ Craniomaxillofac Surg
November 2016
Purpose: Endoscope-assisted surgery has widely been adopted as a basic surgical procedure, with various training systems using virtual reality developed for this procedure. In the present study, a basic training system comprising virtual reality for the removal of submandibular glands under endoscope assistance was developed. The efficacy of the training system was verified in novice oral surgeons.
View Article and Find Full Text PDFPrevious studies have found that the velum in speech production may not only serve as a binary switch with on-off states for nasal and non-nasal sounds, but also partially alter the acoustic characteristics of non-nasalized sounds. The present study investigated the unique functions of the velum in the production of non-nasalized sounds by using morphological, mechanical, and acoustical measurements. Magnetic resonance imaging movies obtained from three Japanese speakers were used to measure the behaviors of the velum and dynamic changes in the pseudo-volume of the pharyngeal cavity during utterances of voiced stops and vowels.
View Article and Find Full Text PDFArticulatory information can support learning or remediating pronunciation of a second language (L2). This paper describes an electromagnetic articulometer-based visual-feedback approach using an articulatory target presented in real-time to facilitate L2 pronunciation learning. This approach trains learners to adjust articulatory positions to match targets for a L2 vowel estimated from productions of vowels that overlap in both L1 and L2.
View Article and Find Full Text PDFCommunity detection in complex networks is a fundamental data analysis task in various domains, and how to effectively find overlapping communities in real applications is still a challenge. In this work, we propose a new unified model and method for finding the best overlapping communities on the basis of the associated node and link partitions derived from the same framework. Specifically, we first describe a unified model that accommodates node and link communities (partitions) together, and then present a nonnegative matrix factorization method to learn the parameters of the model.
View Article and Find Full Text PDF