Multimodal deep learning using on-chip diffractive optics with in situ training capability.

Junwei Cheng Chaoran Huang Jialong Zhang Bo Wu Wenkai Zhang Xinyu Liu Jiahui Zhang Yiyi Tang Hailong Zhou Qiming Zhang Min Gu Jianji Dong Xinliang Zhang

Nat Commun

Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, 430074, China.

Published: July 2024

Multimodal deep learning plays a pivotal role in supporting the processing and learning of diverse data types within the realm of artificial intelligence generated content (AIGC). However, most photonic neuromorphic processors for deep learning can only handle a single data modality (either vision or audio) due to the lack of abundant parameter training in optical domain. Here, we propose and demonstrate a trainable diffractive optical neural network (TDONN) chip based on on-chip diffractive optics with massive tunable elements to address these constraints. The TDONN chip includes one input layer, five hidden layers, and one output layer, and only one forward propagation is required to obtain the inference results without frequent optical-electrical conversion. The customized stochastic gradient descent algorithm and the drop-out mechanism are developed for photonic neurons to realize in situ training and fast convergence in the optical domain. The TDONN chip achieves a potential throughput of 217.6 tera-operations per second (TOPS) with high computing density (447.7 TOPS/mm), high system-level energy efficiency (7.28 TOPS/W), and low optical latency (30.2 ps). The TDONN chip has successfully implemented four-class classification in different modalities (vision, audio, and touch) and achieve 85.7% accuracy on multimodal test sets. Our work opens up a new avenue for multimodal deep learning with integrated photonic processors, providing a potential solution for low-power AI large models using photonic technology.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11266606	PMC
http://dx.doi.org/10.1038/s41467-024-50677-3	DOI Listing

Publication Analysis

Top Keywords

deep learning

tdonn chip

multimodal deep

on-chip diffractive

diffractive optics

situ training

vision audio

optical domain

learning

multimodal

Similar Publications

DeepGOMeta for functional insights into microbial communities using deep learning-based protein function prediction.

Sci Rep

December 2024

KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.

Rund Tawfiq Kexin Niu Robert Hoehndorf Maxat Kulmanov

Analyzing microbial samples remains computationally challenging due to their diversity and complexity. The lack of robust de novo protein function prediction methods exacerbates the difficulty in deriving functional insights from these samples. Traditional prediction methods, dependent on homology and sequence similarity, often fail to predict functions for novel proteins and proteins without known homologs.

View Article and Find Full Text PDF

Similar Publications

Continual deep reinforcement learning with task-agnostic policy distillation.

Sci Rep

December 2024

Department of Informatics, University of Hamburg, Hamburg, Germany.

Muhammad Burhan Hafez Kerim Erekmen

Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space.

View Article and Find Full Text PDF

Similar Publications

Optimizing VGG16 deep learning model with enhanced hunger games search for logo classification.

Sci Rep

December 2024

Department of Computer Science, Birzeit University, P.O. Box 14, Birzeit, West Bank, Palestine.

Mohammed Hussain Thaer Thaher Mohamed Basel Almourad Majdi Mafarja

Accurate classification of logos is a challenging task in image recognition due to variations in logo size, orientation, and background complexity. Deep learning models, such as VGG16, have demonstrated promising results in handling such tasks. However, their performance is highly dependent on optimal hyperparameter settings, whose fine-tuning is both labor-intensive and time-consuming.

View Article and Find Full Text PDF

Similar Publications

Attention-guided convolutional network for bias-mitigated and interpretable oral lesion classification.

Sci Rep

December 2024

Faculty of Dental Medicine and Oral Health Sciences, McGill University, Montreal, Canada.

Adeetya Patel Camille Besombes Theerthika Dillibabu Mridul Sharma Faleh Tamimi

Accurate diagnosis of oral lesions, early indicators of oral cancer, is a complex clinical challenge. Recent advances in deep learning have demonstrated potential in supporting clinical decisions. This paper introduces a deep learning model for classifying oral lesions, focusing on accuracy, interpretability, and reducing dataset bias.

View Article and Find Full Text PDF

Similar Publications

A two-level resolution neural network with enhanced interpretability for freeway traffic forecasting.

Sci Rep

December 2024

Department of Civil Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.

Semin Kwak Danya Li Nikolas Geroliminis

Deep learning models are widely used for traffic forecasting on freeways due to their ability to learn complex temporal and spatial relationships. In particular, graph neural networks, which integrate graph theory into deep learning, have become popular for modeling traffic sensor networks. However, traditional graph convolutional networks (GCNs) face limitations in capturing long-range spatial correlations, which can hinder accurate long-term predictions.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!