The Multi-Armed Bernoulli Bandit (MABB) problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Although poised in an abstract framework, the applications of the MABB are numerous (Gelly and Wang, 2006; Kocsis and Szepesvari, 2006; Granmo et al., 2007; Granmo and Bouhmala, 2007). On the other hand, while Bayesian methods are generally computationally intractable, they have been shown to provide a standard for optimal decision making. This paper proposes a novel MABB solution scheme that is inherently Bayesian in nature, and which yet avoids the computational intractability by relying simply on updating the hyper-parameters of the sibling conjugate distributions, and on simultaneously sampling randomly from the respective posteriors. Although, in principle, our solution is generic, to be concise, we present here the strategy for Bernoulli distributed rewards. Extensive experiments demonstrate that our scheme outperforms recently proposed bandit playing algorithms. We thus believe that our methodology opens avenues for obtaining improved novel solutions.

Additional Metadata
Keywords Bandit problems, Bayesian learning, Conjugate priors, Sampling
Conference 2nd International Conference on Agents and Artificial Intelligence, ICAART 2010
Norheim, T. (Thomas), Brådland, T. (Terje), Granmo, O.-C. (Ole-Christoffer), & Oommen, J. (2010). A generic solution to Multi-Armed Bernoulli Bandit problems based on random sampling from sibling conjugate priors. In ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence, Proceedings (pp. 36–44).