Mostly, sorting of data is a common operation in many applications, which causes the consumption of resources and thus leads to computation overheads. Regarding the context of Big Data multi-query, the shared sort operations are fairly large, which incur high-cost I/Os whether explicit or implicit. In particular, Big Data multi-query, including aggregation and sort operations, takes long execution time due to reshuffle of the same data multiple times using similar tasks. Therefore, exploiting the sharing data and the sharing sort opportunities of similar tasks can offer the possibility of reusing the previous results to optimize multi-query. For considering sharing data, our previous work, Multi-Query Optimization Using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of the sharing data opportunities among multi-query. However, time overheads regarding redundant data in-network movement (i.e., shuffling time to transfer intermediate data for sort operations) have not been considered. Therefore, the MOTH system has been extended to SOOM (Sort-Based Optimizer over MOTH) system to exploit sharing sort opportunities, including explicit sorts of sort queries and implicit sorts of aggregation queries. The proposed SOOM system consists of two additional modules to exploit sharing sort opportunities, namely query explorer and sort exploiter, which leverage our existing MOTH system to fulfill optimizing multiple aggregation and sort queries. The experimental evaluation has shown that the SOOM system outperforms the naive and the state-of-art techniques regarding query execution time among queries by 45% and 30%, respectively, while introducing maximal intermediate data size reduction by 67% and 61% in average, respectively, over Hadoop-like infrastructures.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1089/big.2019.0023 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!