During the shuffle stage of the MapReduce framework, a large volume of data may be relocated to the same destination at the same time. This, in turn, may lead to the network hotspot problem. On the other hand, it is always more effective to achieve better data locality by moving the computation closer to the data than the other way around. However, doing this may result in the partitioning skew problem, which is characterized by the unbalanced computational loads between the destinations. Consequently, shuffling algorithms should consider all the following criteria: data locality, partitioning skew, and network hotspot. In order to do so, we introduce MCSA, a Multi-Criteria shuffling algorithm for the MapReduce scheduling stage that rests on three cost functions to accurately reflect the trade-offs between these different criteria. Extensive simulations were conducted and their results show that the MCSA-based scheduler consistently outperforms other schedulers based on these criteria. Furthermore, the MCSA-based scheduler can be easily adjusted to the meet the distinct needs of different customers.

Additional Metadata
Persistent URL dx.doi.org/10.1109/UIC-ATC.2017.8397651
Conference 2017 IEEE SmartWorld Ubiquitous Intelligence and Computing, Advanced and Trusted Computed, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People and Smart City Innovation, SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI 2017
Citation
Corriveau, J, Lyu, L. (Leo), Elhabyan, R. (Riham), & Shi, W. (2018). MCSA: A multi-criteria shuffling algorithm for the MapReduce framework. In 2017 IEEE SmartWorld Ubiquitous Intelligence and Computing, Advanced and Trusted Computed, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People and Smart City Innovation, SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI 2017 - Conference Proceedings (pp. 1–6). doi:10.1109/UIC-ATC.2017.8397651