Introducing Trial-and-Error Exploration to Avoid Critical Failure for Efficient Reinforcement Learning
Abstract
The combination of deep reinforcement learning~(RL) and search has achieved human-level performance in board games and various control tasks. Such neuro-symbolic AI has been gaining increasing attention to automate real-world tasks that would be impossible without it. However, the training process usually involves exhaustive exploration of a wide variety of possible scenarios, requiring extensive time and computational resources. To overcome this challenge, we propose an efficient RL algorithm to improve exploration. We introduce trial-and-error exploration that explores states where a critical mistake has happened so that the agent can actively learn to avoid such failures. When the agent recognizes a failure, our method lets the agent retract its action and try different ones until a better action is found. Our evaluation on Cartpole and the board game Othello demonstrated that our method DQN and AlphaZero, that do not have the trial-and-error scheme.