SOOM: Sort-Based Optimizer for Big Data Multi-Query.

Big Data

Faculty of Computers and Information, Cairo University, Cairo, Egypt.

Published: February 2020

Mostly, sorting of data is a common operation in many applications, which causes the consumption of resources and thus leads to computation overheads. Regarding the context of Big Data multi-query, the shared sort operations are fairly large, which incur high-cost I/Os whether explicit or implicit. In particular, Big Data multi-query, including aggregation and sort operations, takes long execution time due to reshuffle of the same data multiple times using similar tasks. Therefore, exploiting the sharing data and the sharing sort opportunities of similar tasks can offer the possibility of reusing the previous results to optimize multi-query. For considering sharing data, our previous work, Multi-Query Optimization Using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of the sharing data opportunities among multi-query. However, time overheads regarding redundant data in-network movement (i.e., shuffling time to transfer intermediate data for sort operations) have not been considered. Therefore, the MOTH system has been extended to SOOM (Sort-Based Optimizer over MOTH) system to exploit sharing sort opportunities, including explicit sorts of sort queries and implicit sorts of aggregation queries. The proposed SOOM system consists of two additional modules to exploit sharing sort opportunities, namely query explorer and sort exploiter, which leverage our existing MOTH system to fulfill optimizing multiple aggregation and sort queries. The experimental evaluation has shown that the SOOM system outperforms the naive and the state-of-art techniques regarding query execution time among queries by 45% and 30%, respectively, while introducing maximal intermediate data size reduction by 67% and 61% in average, respectively, over Hadoop-like infrastructures.

Download full-text PDF

Source
http://dx.doi.org/10.1089/big.2019.0023DOI Listing

Publication Analysis

Top Keywords

moth system
16
big data
12
data multi-query
12
sort operations
12
sharing data
12
sharing sort
12
sort opportunities
12
data
11
sort
9
soom sort-based
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!