Many applications use data that are better represented in the binary matrix form, such as click-stream data, market basket data, document-term data, user-permission data in access control, and others. Matrix factorization methods have been widely used tools for the analysis of high-dimensional data, as they automatically extract sparse and meaningful features from data vectors. However, existing matrix factorization methods do not work well for the binary data. One crucial limitation is interpretability, as many matrix factorization methods decompose an input matrix into matrices with fractional or even negative components, which are hard to interpret in many real settings. Some matrix factorization methods, like binary matrix factorization, do limit decomposed matrices to binary values. However, these models are not flexible to accommodate some data analysis tasks, like trading off summary size with quality and discriminating different types of approximation errors. To address those issues, this article presents weighted rank-one binary matrix factorization, which is to approximate a binary matrix by the product of two binary vectors, with parameters controlling different types of approximation errors. By systematically running weighted rank-one binary matrix factorization, one can effectively perform various binary data analysis tasks, like compression, clustering, and pattern discovery. Theoretical properties on weighted rank-one binary matrix factorization are investigated and its connection to problems in other research domains are examined. As weighted rank-one binary matrix factorization in general is NP-hard, efficient and effective algorithms are presented. Extensive studies on applications of weighted rank-one binary matrix factorization are also conducted.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7695232PMC
http://dx.doi.org/10.1145/3386599DOI Listing

Publication Analysis

Top Keywords

matrix factorization
44
binary matrix
36
weighted rank-one
24
rank-one binary
24
factorization methods
16
matrix
14
binary
13
factorization
11
data
10
applications weighted
8

Similar Publications

Background: Recent years have seen persistently poor prognoses for glioma patients. Therefore, exploring the molecular subtyping of gliomas, identifying novel prognostic biomarkers, and understanding the characteristics of their immune microenvironments are crucial for improving treatment strategies and patient outcomes.

Methods: We integrated glioma datasets from multiple sources, employing Non-negative Matrix Factorization (NMF) to cluster samples and filter for differentially expressed metabolic genes.

View Article and Find Full Text PDF

Background: Hepatocellular carcinoma (LIHC) poses a significant health challenge worldwide, primarily due to late-stage diagnosis and the limited effectiveness of current therapies. Cancer stem cells are known to play a role in tumor development, metastasis, and resistance to treatment. A thorough understanding of genes associated with stem cells is crucial for improving the diagnostic precision of LIHC and for the advancement of effective immunotherapy approaches.

View Article and Find Full Text PDF

Drug development is known to be a costly and time-consuming process, which is prone to high failure rates. Drug repurposing allows drug discovery by reusing already approved compounds. The outcomes of past clinical trials can be used to predict novel drug-disease associations by leveraging drug- and disease-related similarities.

View Article and Find Full Text PDF

Background: Hepatocellular carcinoma (HCC) is a common malignant tumor of the digestive system with a high incidence that seriously threatens patients' lives and health. However, with the rise and application of new treatments, such as immunotherapy, there are still some restrictions in the treatment and diagnosis of HCC, and the therapeutic effects on patients are not ideal.

Methods: Two single-cell RNA sequencing (scRNA-seq) datasets from HCC patients, encompassing 25,189 cells, were analyzed in the study.

View Article and Find Full Text PDF

Background: Migraine is a complex neurological disorder characterized by recurrent episodes of severe headaches. Although genetic factors have been implicated, the precise molecular mechanisms, particularly gene expression patterns in migraine-associated brain regions, remain unclear. This study applies machine learning techniques to explore region-specific gene expression profiles and identify critical gene programs and transcription factors linked to migraine pathogenesis.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!