**2018:**
**Best of both worlds: Stochastic & adversarial best-arm identification,***
Yasin Abbasi-Yadkori, Peter Bartlett, Victor Gabillon, Alan Malek & Michal Valko,
*COLT 2018**,*
[Paper]
[Poster]
[Abstract]

We study bandit best-arm identification
with arbitrary and potentially adversarial rewards.
A simple random uniform
learner obtains the optimal rate of error in the
adversarial scenario. However, this
type of strategy is suboptimal when the rewards are sampled
stochastically. Therefore, we ask:

*Can we design a learner that
performs optimally in both the stochastic
and adversarial problems while not being aware of the nature of the rewards?*

First, we show that designing such a learner is impossible in general.
In particular, to be robust to adversarial rewards, we can
only guarantee optimal rates of
error on a subset of the stochastic problems. We give a lower bound
that characterizes the optimal rate in stochastic problems if the
strategy is constrained to be robust to adversarial rewards.
Finally, we design a simple parameter-free algorithm and show that its
probability of error matches (up to log factors) the lower bound in
stochastic problems, and it is also robust to adversarial ones.

**2017:**
**Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem,***
Yasin Abbasi-Yadkori, Peter Bartlett & Victor Gabillon,
*NIPS 2017**,*
[Paper]
[Poster]
[Abstract]

We study minimax strategies for the online prediction problem with
expert advice.
It has been conjectured that a simple adversary strategy,
called COMB, is near optimal in this game for any number of experts.
Our results and new insights make progress in this direction by showing that, up to a small additive term, COMB is minimax optimal in the finite-time three expert problem. In addition, we provide for this setting a new near minimax optimal COMB-based learner.
Prior to this work, in this fundamental learning
problem, learners obtaining the optimal multiplicative constant in their regret rate were known only when $K=2$ or $K\to \infty $.
We characterize, when
$K=3$, the regret of the game scaling as
$\sqrt{8/\left(9\pi \right)T}\pm \mathrm{log}{\left(T\right)}^{2}$
which gives for the first time the optimal constant in the leading ($\sqrt{T}$) term of the regret.

**2017:**
**Hit-and-Run for Sampling and Planning in Non-Convex Spaces,***
Yasin Abbasi-Yadkori, Peter Bartlett, Victor Gabillon & Alan Malek,
*AISTATS 2017**,*
[Paper]
[Abstract]

We propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces. For sampling, we show the first analysis of the Hit-and-Run algorithm in non-convex spaces and show that it mixes fast as long as certain smoothness conditions are satisfied. In particular, our analysis reveals an intriguing connection between fast mixing and the existence of smooth measure-preserving mappings from a convex space to the non-convex space. For planning, we show advantages of Hit-and-Run compared to state-of-the-art planning methods such as Rapidly-Exploring Random Trees.

**Contact:**
victorgabillon

*at* gmail

*dot* com