Laboratory of Structural Methods of Data Analysis in Predictive
Modeling Moscow Institute of Physics and Technology
On Effectiveness of the Mirror Decent Algorithm for a Stochastic Multi-Armed Bandit Governed by a Stationary Finite Markov Chain
Conference Name: The 3rd Australian Control Conference (AUCC2013)
Date Published: 11/2013
Publisher: Engineers Australia
Conference Location: Perth, Australia
ISBN Number: 978-1-4799-2497-4

In this article, we study the effectiveness of the Mirror Descent Randomized Control Algorithm recently developed to a class of homogeneous finite Markov chains governed by the stochastic multi-armed bandit with unknown mean losses. We prove the explicit, non-asymptotic both upper and lower bounds for the mean losses at a given (finite) time horizon. These bounds are very similar as functions of problem parameters and time horizon, but with different logarithmic term and absolute constant. Numerical example illustrates theoretical results.

Авторы: Nazin Alexander , Miller, B

Дата: 17 ноября 2014

Статус: опубликована

Год: 2013

Google scholar:

Направления исследований

Primal-dual subgradient methods

Huge-scale problems

Дополнительные материалы