Resource allocation and scheduling on clouds are required to harness the power of the underlying resource pool such that the service provider can meet the quality of service requirements of users, which are often captured in service level agreements (SLAs). This paper focuses on resource allocation and scheduling on clouds and clusters that process MapReduce jobs with SLAs. The resource allocation and scheduling problem is modelled as an optimization problem using constraint programming, and a novel MapReduce Constraint Programming based Resource Management algorithm (MRCP-RM) is devised that can effectively process an open stream of MapReduce jobs where each job is characterized by an SLA comprising an earliest start time, a required execution time, and an end-to-end deadline. A detailed performance evaluation of MRCP-RM is conducted for an open system subjected to a stream of job arrivals using both simulation and experimentation on a real system. The experiments on a real system are performed on a Hadoop cluster (deployed on Amazon EC2) that runs our new Hadoop Constraint Programming based Resource Management algorithm (HCP-RM) that incorporates a technique for handling data locality. The results of the performance evaluation demonstrate the effectiveness of MRCP-RM/HCP-RM in generating a schedule that leads to a low proportion of jobs missing their deadlines (P) and also provide insights into system behaviour and performance. In the simulation experiments, it is observed that MRCP-RM achieves on average an 82 percent lower P compared to a technique from the existing literature when processing a synthetic workload from Facebook. Furthermore, in the experiments performed on a Hadoop cluster deployed on Amazon EC2, it is observed that HCP-RM achieved on average a 63 percent lower P compared to an EDF-Scheduler for a wide variety of workload and system parameters experimented with.

Additional Metadata
Keywords and constraint programming, hadoop scheduler, Resource allocation and scheduling on clusters and clouds
Persistent URL dx.doi.org/10.1109/TPDS.2016.2617324
Journal IEEE Transactions on Parallel and Distributed Systems
Citation
Lim, N. (Norman), Majumdar, S, & Ashwood-Smith, P. (Peter). (2017). MRCP-RM: A technique for resource allocation and scheduling of mapreduce jobs with deadlines. IEEE Transactions on Parallel and Distributed Systems, 28(5), 1375–1389. doi:10.1109/TPDS.2016.2617324