We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU BUCKET SORT) is not only considerably faster than the best comparison-based sorting algorithm for GPUs (THRUST MERGE [Satish et.al., Proc. IPDPS 2009]) but also as fast as randomized sample sort for GPUs (GPU SAMPLE SORT [Leischner et.al., Proc. IPDPS 2010]). However, deterministic sample sort has the advantage that bucket sizes are guaranteed and therefore its running time does not have the input data dependent fluctuations that can occur for randomized sample sort.

Additional Metadata
Keywords GPU, Parallel Algorithms, Sorting
Persistent URL dx.doi.org/10.1142/S0129626412500089
Journal Parallel Processing Letters
Dehne, F, & Zaboli, H. (Hamidreza). (2012). Deterministic sample sort for GPUs. Parallel Processing Letters (Vol. 22). doi:10.1142/S0129626412500089