The visualization of streaming high-dimensional data often needs to consider the speed in dimensionality reduction algorithms, the quality of visualized data patterns, and the stability of view graphs that usually change over time with new data. Existing methods of streaming high-dimensional data visualization primarily line up essential modules in a serial manner and often face challenges in satisfying all these design considerations. In this research, we propose a novel parallel framework for streaming high-dimensional data visualization to achieve high data processing speed, high quality in data patterns, and good stability in visual presentations. This framework arranges all essential modules in parallel to mitigate the delays caused by module waiting in serial setups. In addition, to facilitate the parallel pipeline, we redesign these modules with a parametric non-linear embedding method for new data embedding, an incremental learning method for online embedding function updating, and a hybrid strategy for optimized embedding updating. We also improve the coordination mechanism among these modules. Our experiments show that our method has advantages in embedding speed, quality, and stability over other existing methods to visualize streaming high-dimensional data.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TVCG.2023.3326515DOI Listing

Publication Analysis

Top Keywords

streaming high-dimensional
16
high-dimensional data
16
data
9
parallel framework
8
framework streaming
8
dimensionality reduction
8
data patterns
8
existing methods
8
data visualization
8
essential modules
8

Similar Publications

Incremental broad learning system (IBLS) is an effective and efficient incremental learning method based on broad learning paradigm. Owing to its streamlined network architecture and flexible dynamic update scheme, IBLS can achieve rapid incremental reconstruction on the basis of the previous model without the entire retraining from scratch, which enables it adept at handling streaming data. However, two prominent deficiencies still persist in IBLS and constrain its further promotion in large-scale data stream scenarios.

View Article and Find Full Text PDF
Article Synopsis
  • The GFPrint™ algorithm is designed to analyze large genetic sequencing data, helping to uncover important features related to diseases.
  • It has been validated using cancer genomic datasets, revealing gene mutations that negatively impact survival in various cancers like colorectal cancer and breast cancer.
  • GFPrint™ is accessible online, making it useful for any medical field where understanding genetic profiles can aid in disease management.
View Article and Find Full Text PDF

Semantic embedding based online cross-modal hashing method.

Sci Rep

January 2024

School of Data Science and Computer Science, Shandong Women's University, Jinan, 250300, China.

Hashing has been extensively utilized in cross-modal retrieval due to its high efficiency in handling large-scale, high-dimensional data. However, most existing cross-modal hashing methods operate as offline learning models, which learn hash codes in a batch-based manner and prove to be inefficient for streaming data. Recently, several online cross-modal hashing methods have been proposed to address the streaming data scenario.

View Article and Find Full Text PDF

The visualization of streaming high-dimensional data often needs to consider the speed in dimensionality reduction algorithms, the quality of visualized data patterns, and the stability of view graphs that usually change over time with new data. Existing methods of streaming high-dimensional data visualization primarily line up essential modules in a serial manner and often face challenges in satisfying all these design considerations. In this research, we propose a novel parallel framework for streaming high-dimensional data visualization to achieve high data processing speed, high quality in data patterns, and good stability in visual presentations.

View Article and Find Full Text PDF

Pairwise learning is an important machine-learning topic with many practical applications. An online algorithm is the first choice for processing streaming data and is preferred for handling large-scale pairwise learning problems. However, existing online pairwise learning algorithms are not scalable and efficient enough for large-scale high-dimensional data, because they were designed based on singly stochastic gradients.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!