Fully learnable deep wavelet transform for unsupervised monitoring of high-frequency time series.

Proc Natl Acad Sci U S A

Chair of Intelligent Maintenance Systems, ETH Zürich, 8049 Zürich, Switzerland

Published: February 2022

AI Article Synopsis

  • High-frequency (HF) signals are important for monitoring industrial assets, but traditional deep-learning tools often struggle with their size and complexity.
  • This paper presents a fully unsupervised deep-learning framework that extracts meaningful representations from raw HF signals by incorporating properties of the fast discrete wavelet transform (FDWT).
  • The proposed architecture is designed to be learnable, allowing for effective denoising and feature extraction without needing prior knowledge or additional processing, and it outperforms existing methods in various machine-learning tasks on sound datasets.

Article Abstract

High-frequency (HF) signals are ubiquitous in the industrial world and are of great use for monitoring of industrial assets. Most deep-learning tools are designed for inputs of fixed and/or very limited size and many successful applications of deep learning to the industrial context use as inputs extracted features, which are a manually and often arduously obtained compact representation of the original signal. In this paper, we propose a fully unsupervised deep-learning framework that is able to extract a meaningful and sparse representation of raw HF signals. We embed in our architecture important properties of the fast discrete wavelet transform (FDWT) such as 1) the cascade algorithm; 2) the conjugate quadrature filter property that links together the wavelet, the scaling, and transposed filter functions; and 3) the coefficient denoising. Using deep learning, we make this architecture fully learnable: Both the wavelet bases and the wavelet coefficient denoising become learnable. To achieve this objective, we propose an activation function that performs a learnable hard thresholding of the wavelet coefficients. With our framework, the denoising FDWT becomes a fully learnable unsupervised tool that does not require any type of pre- or postprocessing or any prior knowledge on wavelet transform. We demonstrate the benefits of embedding all these properties on three machine-learning tasks performed on open-source sound datasets. We perform an ablation study of the impact of each property on the performance of the architecture, achieve results well above baseline, and outperform other state-of-the-art methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8872732PMC
http://dx.doi.org/10.1073/pnas.2106598119DOI Listing

Publication Analysis

Top Keywords

fully learnable
12
wavelet transform
12
deep learning
8
coefficient denoising
8
wavelet
7
fully
4
learnable deep
4
deep wavelet
4
transform unsupervised
4
unsupervised monitoring
4

Similar Publications

Traditional multimodal contrastive learning brings text and its corresponding image closer together as a positive pair, where the text typically consists of fixed sentence structures or specific descriptive statements, and the image features are generally global features (with some fine-grained work using local features). Similar to unimodal self-supervised contrastive learning, this approach can be seen as enforcing a strict identity constraint in a multimodal context. However, due to the inherent complexity of remote sensing images, which cannot be easily described in a single sentence, and the fact that remote sensing images contain rich ancillary information beyond just object features, this strict identity constraint may be insufficient.

View Article and Find Full Text PDF

Many optical applications require accurate control over a beam's spatial intensity profile, in particular, achieving uniform irradiance across a target area can be critically important for nonlinear optical processes such as laser machining. This paper introduces a novel control algorithm for Digital Micromirror Devices (DMDs) that simultaneously and adaptively modulates both the intensity and the spatial intensity profile of an incident beam with random and intricate intensity variations in a single step. The algorithm treats each micromirror within the DMD as an independent Bernoulli distribution characterized by a learnable parameter.

View Article and Find Full Text PDF

Accurate segmentation of skin lesions within dermoscopic images plays a crucial role in the timely identification of skin cancer for computer-aided diagnosis on mobile platforms. However, varying shapes of the lesions, lack of defined edges, and the presence of obstructions such as hair strands and marker colours make this challenge more complex. Additionally, skin lesions often exhibit subtle variations in texture and colour that are difficult to differentiate from surrounding healthy skin, necessitating models that can capture both fine-grained details and broader contextual information.

View Article and Find Full Text PDF

Vector field attention for deformable image registration.

J Med Imaging (Bellingham)

November 2024

Johns Hopkins University, Department of Electrical and Computer Engineering, Baltimore, Maryland, United States.

Article Synopsis
  • Deformable image registration connects fixed and moving images using deep learning for faster and more accurate results.
  • VFA (Vector Field Attention) is a new method that improves efficiency by directly retrieving location correspondences without needing complex learnable parameters.
  • Testing on various datasets shows that VFA performs as well or better than other leading methods, making it a promising approach for future applications in image registration.
View Article and Find Full Text PDF

VLFATRollout: Fully transformer-based classifier for retinal OCT volumes.

Comput Med Imaging Graph

December 2024

Christian Doppler Laboratory for Artificial Intelligence in Retina, Department of Ophthalmology and Optometry, Medical University of Vienna, Austria; Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria.

Background And Objective: Despite the promising capabilities of 3D transformer architectures in video analysis, their application to high-resolution 3D medical volumes encounters several challenges. One major limitation is the high number of 3D patches, which reduces the efficiency of the global self-attention mechanisms of transformers. Additionally, background information can distract vision transformers from focusing on crucial areas of the input image, thereby introducing noise into the final representation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!