Motivation: Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been specially developed to be applied to binary datasets. Several approaches based on matrix factorization, suffix trees or divide-and-conquer techniques have been proposed to extract useful biclusters from binary data, and these approaches provide information about the distribution of patterns and intrinsic correlations.

Results: A novel approach to extracting biclusters from binary datasets, BiBit, is introduced here. The results obtained from different experiments with synthetic data reveal the excellent performance and the robustness of BiBit to density and size of input data. Also, BiBit is applied to a central nervous system embryonic tumor gene expression dataset to test the quality of the results. A novel gene expression preprocessing methodology, based on expression level layers, and the selective search performed by BiBit, based on a very fast bit-pattern processing technique, provide very satisfactory results in quality and computational cost. The power of biclustering in finding genes involved simultaneously in different cancer processes is also shown. Finally, a comparison with Bimax, one of the most cited binary biclustering algorithms, shows that BiBit is faster while providing essentially the same results.

Availability: The source and binary codes, the datasets used in the experiments and the results can be found at: http://www.upo.es/eps/bigs/BiBit.html

Contact: dsrodbae@upo.es

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btr464DOI Listing

Publication Analysis

Top Keywords

binary datasets
16
biclustering algorithms
8
biclusters binary
8
gene expression
8
binary
7
datasets
5
data
5
bibit
5
biclustering
4
biclustering algorithm
4

Similar Publications

Background: Skin cancer is the most common cancer worldwide, with melanoma being the deadliest type, though it accounts for less than 5% of cases. Traditional skin cancer detection methods are effective but are often costly and time-consuming. Recent advances in artificial intelligence have improved skin cancer diagnosis by helping dermatologists identify suspicious lesions.

View Article and Find Full Text PDF

Background/objectives: Cystoscopy is necessary for diagnosing bladder cancer, but it has limitations in identifying ambiguous lesions, such as carcinoma in situ (CIS), which leads to a high recurrence rate of bladder cancer. With the significant advancements in deep learning in the medical field, several studies have explored its application in cystoscopy. This study aimed to utilize the VGG19 and Deeplab v3+ deep learning models to classify and segment cystoscope images, respectively.

View Article and Find Full Text PDF

Identify the underlying true model from other models for clinical practice using model performance measures.

BMC Med Res Methodol

January 2025

School of Mathematical Sciences, Xiamen University, Xiamen, 361005, People's Republic of China.

Objective: To assess whether the outcome generation true model could be identified from other candidate models for clinical practice with current conventional model performance measures considering various simulation scenarios and a CVD risk prediction as exemplar.

Study Design And Setting: Thousands of scenarios of true models were used to simulate clinical data, various candidate models and true models were trained on training datasets and then compared on testing datasets with 25 conventional use model performance measures. This consists of univariate simulation (179.

View Article and Find Full Text PDF

vClean: assessing virus sequence contamination in viral genomes.

NAR Genom Bioinform

March 2025

Department of Life Science and Medical Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan.

Recent advancements in viral metagenomics and single-virus genomics have improved our ability to obtain the draft genomes of environmental viruses. However, these methods can introduce virus sequence contaminations into viral genomes when short, fragmented partial sequences are present in the assembled contigs. These contaminations can lead to incorrect analyses; however, practical detection tools are lacking.

View Article and Find Full Text PDF

Background: Hypertension is a global health concern, particularly among middle-aged and older adults. This study aims to fill this gap by examining hypertension prevalence and risk factors using data from the Longitudinal Ageing Study in India (LASI).

Methods: This study used data from the LASI, which is a nationally representative Longitudinal Ageing Study in India (LASI), wave-1 (2017-2018).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!