This paper focuses on priority based processing of streaming data. One of the greatest challenges in big data analytics is responding to a bursty input load. The common solutions are to use dynamic resource provisioning techniques, however, these techniques may not respond quickly enough to the change in the load. Another option is to overprovision, but this results in wasted computing resources. This paper describes a technique that can be used in cases where resources are statically provisioned. This technique enables users to prioritize certain input data items so that in cases where the load suddenly increases, the high priority items are given precedence over low priority items. This technique is implemented on the Spark Streaming engine.

Additional Metadata
Keywords priority scheduling, Spark, Spark Streaming
Persistent URL dx.doi.org/10.1109/BDCAT.2018.00034
Conference 5th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2018
Citation
Ajila, T. (Tobi), & Majumdar, S. (2019). Data Driven Priority Scheduling on Spark Based Stream Processing. In Proceedings - 5th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2018 (pp. 208–210). doi:10.1109/BDCAT.2018.00034