Separation of speech mixtures, often referred to as the cocktail party problem, has been studied for decades. In many source separation tasks, the separation method is limited by the assumption of at least as many sensors as sources. Further, many methods require that the number of signals within the recorded mixtures be known in advance. In many real-world applications, these limitations are too restrictive. We propose a novel method for underdetermined blind source separation using an instantaneous mixing model which assumes closely spaced microphones. Two source separation techniques have been combined, independent component analysis (ICA) and binary time - frequency (T-F) masking. By estimating binary masks from the outputs of an ICA algorithm, it is possible in an iterative way to extract basis speech signals from a convolutive mixture. The basis signals are afterwards improved by grouping similar signals. Using two microphones, we can separate, in principle, an arbitrary number of mixed speech signals. We show separation results for mixtures with as many as seven speech signals under instantaneous conditions. We also show that the proposed method is applicable to segregate speech signals under reverberant conditions, and we compare our proposed method to another state-of-the-art algorithm. The number of source signals is not assumed to be known in advance and it is possible to maintain the extracted signals as stereo signals.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNN.2007.911740 | DOI Listing |
Diagnostics (Basel)
December 2024
GITA Lab., Faculty of Engineering, University of Antioquia, Medellín 050010, Colombia.
Background/objectives: Parkinson's disease (PD) affects more than 6 million people worldwide. Its accurate diagnosis and monitoring are key factors to reduce its economic burden. Typical approaches consider either speech signals or video recordings of the face to automatically model abnormal patterns in PD patients.
View Article and Find Full Text PDFPolymers (Basel)
December 2024
Chongqing Academy of Metrology and Quality Inspection, Chongqing 401120, China.
Dynamic hydrogels have attracted considerable attention in the application of flexible electronics, as they possess injectable and self-healing abilities. However, it is still a challenge to combine high conductivity and antibacterial properties into dynamic hydrogels. In this work, we fabricated a type of dynamic hydrogel based on acylhydrazone bonds between thermo-responsive copolymer and silver nanoparticles (AgNPs) functionalized with hydrazide groups.
View Article and Find Full Text PDFJ Neurosci
January 2025
Department of Psychology, Chinese University of Hong Kong, Hong Kong SAR, China
The extraction and analysis of pitch underpin speech and music recognition, sound segregation, and other auditory tasks. Perceptually, pitch can be represented as a helix composed of two factors: height monotonically aligns with frequency, while chroma cyclically repeats at doubled frequencies. Although the early perceptual and neurophysiological mechanisms for extracting pitch from acoustic signals have been extensively investigated, the equally essential subsequent stages that bridge to high-level auditory cognition remain less well understood.
View Article and Find Full Text PDFJ Neural Eng
January 2025
Department of Pediatrics, Oregon Health & Science University, 3181 SW Sam Jackson Park Rd, Portland, Oregon, 97239-3098, UNITED STATES.
Objective: The RSVP Keyboard is a non-implantable, event-related potential-based brain-computer interface (BCI) system designed to support communication access for people with severe speech and physical impairments. Here we introduce Inquiry Preview, a new RSVP Keyboard interface incorporating switch input for users with some voluntary motor function, and describe its effects on typing performance and other outcomes.
Approach: Four individuals with disabilities participated in the collaborative design of possible switch input applications for the RSVP Keyboard, leading to the development of Inquiry Preview and a method of fusing switch input with language model and electroencephalography (EEG) evidence for typing.
PLoS One
January 2025
Dept. of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany.
Music pre-processing methods are currently becoming a recognized area of research with the goal of making music more accessible to listeners with a hearing impairment. Our previous study showed that hearing-impaired listeners preferred spectrally manipulated multi-track mixes. Nevertheless, the acoustical basis of mixing for hearing-impaired listeners remains poorly understood.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!