We consider running-time optimization for band-joins in a distributed system, e.g., the cloud. To balance load across worker machines, input has to be partitioned, which causes duplication. We explore how to resolve this tension between and for band-joins between two relations. Previous work suffered from high optimization cost or considered partitionings that were too restricted (resulting in suboptimal join performance). Our main insight is that with the appropriate split scoring measure can achieve both low optimization cost and low join cost. It is the first approach that is not only effective for one-dimensional band-joins but also for joins on multiple attributes. Experiments indicate that our method is able to find partitionings that are within 10% of the for both maximum load per worker and input duplication for a broad range of settings, significantly improving over previous work.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7872589PMC
http://dx.doi.org/10.1145/3318464.3389750DOI Listing

Publication Analysis

Top Keywords

load worker
8
previous work
8
optimization cost
8
near-optimal distributed
4
band-joins
4
distributed band-joins
4
band-joins recursive
4
recursive partitioning
4
partitioning consider
4
consider running-time
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!