Question Answering System with Sparse and Noisy Feedback
Abstract
The rise of personal assistants has made question answering a very popular mechanism for user-system interaction. In Question Answering System, implicit feedbacks can be easily observed (user clicking in the link given by the QA system), but they are noisy. However, receiving an explicit feedback on the quality of the response just given is rare but more valuable. Motivated by a practical need in Question Answering System of processing these two types of rewards, this paper investigates and proposes a new stochastic multi-armed bandit model in which each action has a noisy reward and a sparse reward. We studied this problem in the contextual bandit settings, and proposed and analyzed efficient algorithms that are based on the LINUCB frameworks. Our algorithms are verified by empirical studies on various reward distributions and a real-world dataset and application.