This paper deals with the extremely pertinent problem of web crawling, which is far from trivial considering the magnitude and all-pervasive nature of the World-Wide Web. While numerous AI tools can be used to deal with this task, in this paper we map the problem onto the combinatoriallyhard stochastic non-linear fractional knapsack problem, which, in turn, is then solved using Learning Automata (LA). Such LA-based solutions have been recently shown to outperform previous state-of-the-art approaches to resource allocation in Web monitoring. However, the ever growing deployment of distributed systems raises the need for solutions that cope with a distributed setting. In this paper, we present a novel scheme for solving the non-linear fractional bin packing problem. Furthermore, we demonstrate that our scheme has applications to Web crawling, i.e., distributed resource allocation, and in particular, to distributed Web monitoring. Comprehensive experimental results demonstrate the superiority of our scheme when compared to other classical approaches.

Additional Metadata
Persistent URL dx.doi.org/10.1109/CSE.2014.40
Conference 17th IEEE International Conference on Computational Science and Engineering, CSE 2014 - Jointly with 13th IEEE International Conference on Ubiquitous Computing and Communications, IUCC 2014, 13th International Symposium on Pervasive Systems, Algorithms, and Networks, I-SPAN 2014 and 8th International Conference on Frontier of Computer Science and Technology, FCST 2014
Citation
Yazidi, A. (Anis), Oommen, J, Granmo, O.-C. (Ole-Christoffer), & Goodwin, M. (Morten). (2015). On utilizing stochastic non-linear fractional bin packing to resolve distributed web crawling. In Proceedings - 17th IEEE International Conference on Computational Science and Engineering, CSE 2014, Jointly with 13th IEEE International Conference on Ubiquitous Computing and Communications, IUCC 2014, 13th International Symposium on Pervasive Systems, Algorithms, and Networks, I-SPAN 2014 and 8th International Conference on Frontier of Computer Science and Technology, FCST 2014 (pp. 32–37). doi:10.1109/CSE.2014.40