J. Qin and M.A. Bauer (Canada)
Simulations, resource co-allocation, multiple HPC clusters, inter-cluster communication.
To more effectively use HPC clusters for even larger computations, users are looking to interconnect multiple HPC clusters, creating a grid. To effectively use such grids, it may be desirable to split and co-allocate jobs requiring many processes across multiple clusters. The benefit, in terms of reducing users’ turn-around time, however, ultimately depends on the inter-cluster communication cost. In studies of job co-allocation strategies, previous research commonly used a uniform slowdown ratio and a static communication model to examine the impact on a job’s execution if the job was split across multiple clusters. However, in reality the slowdown ratio is unlikely to be uniform when there is a choice of multiple clusters with different communication links. Moreover, the slowdown ratio may actually change dynamically based on the run-time circumstances. In this paper, we report on a simulator which was developed to simulate the dynamic behavior of jobs across multiple clusters. The simulator has been validated based the experiments across two HPC clusters. The overall objective of the work is to understand the impact of communications on multi-processor jobs in order to develop scheduling and co-allocation strategies which can accommodate communication factors.
Important Links:
Go Back