With the exponential growth of network resources, recommendation systems have become successful at combating information overload. In intelligent recommendation systems, the prediction of click-through rates (CTR) plays a crucial role. Most CTR models employ a parallel network architecture to successfully capture explicit and implicit feature interactions. However, the existing models ignore two aspects. One limitation observed in most models is that they focus only on the interaction of paired term features, with no emphasis on modeling unary terms. The second issue is that most models input characteristics indiscriminately into parallel networks, resulting in network input oversharing. We propose a disentangled self-attention neural network based on information sharing (DSAN) for CTR prediction to simulate complex feature interactions. Firstly, an embedding layer transforms high-dimensional sparse features into low-dimensional dense matrices. Then, the disentangled multi-head self-attention learns the relationship between different features and is fed into a parallel network architecture. Finally, we set up a shared interaction layer to solve the problem of insufficient information sharing in parallel networks. Results from experiments conducted on two real-world datasets demonstrate that our proposed method surpasses existing methods in predictive accuracy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10803010PMC
http://dx.doi.org/10.7717/peerj-cs.1764DOI Listing

Publication Analysis

Top Keywords

disentangled self-attention
8
self-attention neural
8
neural network
8
network based
8
based sharing
8
recommendation systems
8
parallel network
8
network architecture
8
feature interactions
8
parallel networks
8

Similar Publications

Latent Semantic and Disentangled Attention.

IEEE Trans Pattern Anal Mach Intell

December 2024

Sequential learning using transformer has achieved state-of-the-art performance in natural language tasks and many others. The key to this success is the multi-head self attention which encodes and gathers the features from individual tokens of an input sequence. The mapping or decoding is performed to produce an output sequence via cross attention.

View Article and Find Full Text PDF

The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification.

View Article and Find Full Text PDF

Previous multi-modal transformers for RGB-D salient object detection (SOD) generally directly connect all patches from two modalities to model cross-modal correlation and perform multi-modal combination without differentiation, which can lead to confusing and inefficient fusion. Instead, we disentangle the cross-modal complementarity from two views to reduce cross-modal fusion ambiguity: 1) Context disentanglement. We argue that modeling long-range dependencies across modalities as done before is uninformative due to the severe modality gap.

View Article and Find Full Text PDF

With the exponential growth of network resources, recommendation systems have become successful at combating information overload. In intelligent recommendation systems, the prediction of click-through rates (CTR) plays a crucial role. Most CTR models employ a parallel network architecture to successfully capture explicit and implicit feature interactions.

View Article and Find Full Text PDF

Graph neural networks have been widely used by multivariate time series-based anomaly detection algorithms to model the dependencies of system sensors. Previous studies have focused on learning the fixed dependency patterns between sensors. However, they ignore that the inter-sensor and temporal dependencies of time series are highly nonlinear and dynamic, leading to inevitable false alarms.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!