We study bandit best-arm identification
with arbitrary and potentially adversarial rewards.
A simple random uniform
learner obtains the optimal rate of error in the
adversarial scenario. However, this
type of strategy is suboptimal when the rewards are sampled
stochastically. Therefore, we ask:

*Can we design a learner that
performs optimally in both the stochastic
and adversarial problems while not being aware of the nature of the rewards?*
First, we show that designing such a learner is impossible in general.
In particular, to be robust to adversarial rewards, we can
only guarantee optimal rates of
error on a subset of the stochastic problems. We give a lower bound
that characterizes the optimal rate in stochastic problems if the
strategy is constrained to be robust to adversarial rewards.
Finally, we design a simple parameter-free algorithm and show that its
probability of error matches (up to log factors) the lower bound in
stochastic problems, and it is also robust to adversarial ones.