HPCC randomaccess benchmark for next generation supercomputers
Vikas Aggarwal, Yogish Sabharwal, et al.
IPDPS 2009
This paper presents a parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation. The partitioning phase of Quicksort, which has been considered a serial bottleneck, is cooperatively executed in parallel by many processors through the use of fetch-and-add. The parallel algorithm maintains the in-place nature of Quicksort, thereby allowing internal sorting of large arrays. A class of fetch-and-add-based algorithms for dynamically scheduling processors to subproblems is presented. Adaptive scheduling algorithms in this class have low overhead and achieve effective processor load balancing. The basic algorithm is shown to execute in an average of O(log(N)) time on an N-processor PRAM assuming a constant time fetch-and-add. Estimated speedups, based on simulations, are also presented for cases when the number of items to be sorted is much greater than the number of processors. © 1990 IEEE
Vikas Aggarwal, Yogish Sabharwal, et al.
IPDPS 2009
Philip Heidelberger, Kishor S. Trivedi
IEEE TC
Paul Glasserman, Philip Heidelberger, et al.
IEEE TACON
David M. Nicol, Philip Heidelberger
ACM Transactions on Modeling and Computer Simulation (TOMACS)