32 private links
Deep reinforcement learning has achieved great successes inrecent years, however, one main challenge is the sample in-efficiency. In this paper, we focus on how to use action guid-ance by means of a non-expert demonstrator to improve sam-ple efficiency in a domain with sparse, delayed, and pos-sibly deceptive rewards: the recently-proposed multi-agentbenchmark of Pommerman. We propose a new frameworkwhere even a non-expert simulated demonstrator, e.g., plan-ning algorithms such as Monte Carlo tree search with a smallnumber rollouts, can be integrated within asynchronous dis-tributed deep reinforcement learning methods. Compared to avanilla deep RL algorithm, our proposed methods both learnfaster and converge to better policies on a two-player miniversion of the Pommerman game.