Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein-protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took [Formula: see text] s to extract 494,872 biclusters. In the human PPI database of size [Formula: see text], our method generates 1840 biclusters in [Formula: see text] s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes 101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEGG-enriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at ( https://github.com/CMATERJU-BIOINFO/RUBic ) for academic use only.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10655409 | PMC |
http://dx.doi.org/10.1186/s12859-023-05534-3 | DOI Listing |
Sleep Adv
November 2024
Department of Innovative Technologies, Institute of Digital Technologies for Personalized Healthcare (MeDiTech), University of Applied Sciences and Arts of Southern Switzerland, Lugano, Switzerland.
Study Objectives: Polysomnography (PSG) currently serves as the benchmark for evaluating sleep disorders. Its discomfort makes long-term monitoring unfeasible, leading to bias in sleep quality assessment. Hence, less invasive, cost-effective, and portable alternatives need to be explored.
View Article and Find Full Text PDFSci Rep
December 2024
Electronic Engineering College, Heilongjiang University, Harbin, 150080, China.
With the rapid development of the semiconductor industry, Hardware Trojans (HT) as a kind of malicious function that can be implanted at will in all processes of integrated circuit design, manufacturing, and deployment have become a great threat in the field of hardware security. Side-channel analysis is widely used in the detection of HT due to its high efficiency, non-contact nature, and accuracy. In this paper, we propose a framework for HT detection based on contrastive learning using power consumption information in unsupervised or weakly supervised scenarios.
View Article and Find Full Text PDFJ Chem Inf Model
December 2024
Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
Machine learning (ML) methods provide a pathway to accurately predict molecular properties, leveraging patterns derived from structure-property relationships within materials databases. This approach holds significant importance in drug discovery and materials design, where the rapid, efficient screening of molecules can accelerate the development of new pharmaceuticals and chemical materials for highly specialized target application. Unsupervised and self-supervised learning methods applied to graph-based or geometric models have garnered considerable traction.
View Article and Find Full Text PDFACS Meas Sci Au
December 2024
Department of Chemistry, Queen's University, Kingston, Ontario, Canada K7K 0C2.
Ambient mass spectrometry (MS) technologies have been applied to spatial metabolomic profiling of various samples in an attempt to both increase analysis speed and reduce the length of sample preparation. Recent studies, however, have focused on improving the spatial resolution of ambient approaches. Finer resolution requires greater analysis times and commensurate computing power for more sophisticated data analysis algorithms and larger data sets.
View Article and Find Full Text PDFBrief Bioinform
November 2024
College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China.
The rapid advancement of spatial transcriptomics (ST) sequencing technology has made it possible to capture gene expression with spatial coordinate information at the cellular level. Although many methods in ST data analysis can detect spatially variable genes (SVGs), these methods often fail to identify genes with explicit spatial expression patterns due to the lack of consideration for spatial domains. Considering spatial domains is crucial for identifying SVGs as it focuses the analysis of gene expression changes on biologically relevant regions, aiding in the more accurate identification of SVGs associated with specific cell types.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!