Motivation: As the number of publically available microarray experiments increases, the ability to analyze extremely large datasets across multiple experiments becomes critical. There is a requirement to develop algorithms which are fast and can cluster extremely large datasets without affecting the cluster quality. Clustering is an unsupervised exploratory technique applied to microarray data to find similar data structures or expression patterns. Because of the high input/output costs involved and large distance matrices calculated, most of the algomerative clustering algorithms fail on large datasets (30,000 + genes/200 + arrays). In this article, we propose a new two-stage algorithm which partitions the high-dimensional space associated with microarray data using hyperplanes. The first stage is based on the Balanced Iterative Reducing and Clustering using Hierarchies algorithm with the second stage being a conventional k-means clustering technique. This algorithm has been implemented in a software tool (HPCluster) designed to cluster gene expression data. We compared the clustering results using the two-stage hyperplane algorithm with the conventional k-means algorithm from other available programs. Because, the first stage traverses the data in a single scan, the performance and speed increases substantially. The data reduction accomplished in the first stage of the algorithm reduces the memory requirements allowing us to cluster 44,460 genes without failure and significantly decreases the time to complete when compared with popular k-means programs. The software was written in C# (.NET 1.1).
Availability: The program is freely available and can be downloaded from http://www.amdcc.org/bioinformatics/bioinformatics.aspx.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672630 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btp123 | DOI Listing |
J Cardiothorac Surg
January 2025
Dongying People's Hospital (Dongying Hospital of Shandong Provincial Hospital Group), Dongying, 257091, Shandong, People's Republic of China.
Background: Atherosclerosis (AS) is increasingly recognized as a chronic inflammatory disease that significantly compromises vascular health and acts as a major contributor to cardiovascular diseases. Advancements in lipidomics and metabolomics have unveiled the complex role of fatty acid metabolism (FAM) in both healthy and pathological states. However, the specific roles of fatty acid metabolism-related genes (FAMGs) in shaping therapeutic approaches, especially in AS, remain largely unexplored and are a subject of ongoing research.
View Article and Find Full Text PDFJ Imaging Inform Med
January 2025
Faculty of Medicine and Pharmacy of Rabat, Mohammed V University of Rabat, Rabat, 10000, Morocco.
Gastrointestinal (GI) disease examination presents significant challenges to doctors due to the intricate structure of the human digestive system. Colonoscopy and wireless capsule endoscopy are the most commonly used tools for GI examination. However, the large amount of data generated by these technologies requires the expertise and intervention of doctors for disease identification, making manual analysis a very time-consuming task.
View Article and Find Full Text PDFSci Rep
January 2025
Fischell Department of Bioengineering, University of Maryland, College Park, USA.
The development of optical sensors for label-free quantification of cell parameters has numerous uses in the biomedical arena. However, using current optical probes requires the laborious collection of sufficiently large datasets that can be used to calibrate optical probe signals to true metabolite concentrations. Further, most practitioners find it difficult to confidently adapt black box chemometric models that are difficult to troubleshoot in high-stakes applications such as biopharmaceutical manufacturing.
View Article and Find Full Text PDFAm J Hum Genet
January 2025
Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA. Electronic address:
Genetic summary data are broadly accessible and highly useful, including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into summary data, such as allele frequencies, masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted-for substructure limits summary data usability, especially for understudied or admixed populations.
View Article and Find Full Text PDFPhys Med Biol
January 2025
Department of Trauma and Reconstructive Surgery, BG Hospital Bergmanntrost, Merseburger Straße 165 06112 Halle, Halle, Sachsen-Anhalt, 06112, GERMANY.
The purpose of this study was to develop a robust deep learning approach trained with a small in-vivo MRI dataset for multi-label segmentation of all eight carpal bones for therapy planning and wrist dynamic analysis. Approach: A small dataset of 15 3.0-T MRI scans from five health subjects was employed within this study.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!