Irene Ko, Sihui Dai, et al.
NeurIPS 2024
The increased use of personal assistants has made question-answering a common method for user-system interaction. In these systems, while it is easy to observe implicit feedbacks such as a user clicking on a link provided by the QA system, they can be noisy. On the other hand, receiving explicit feedback on the response is rare but more valuable. To address this issue, this paper proposes a new stochastic multi-armed bandit model that considers both types of feedbacks, noisy and sparse rewards. The model is studied in both classical and contextual bandit settings, and efficient algorithm is proposed and analyzed based on the UCB framework. This algorithm is evaluated through empirical studies on various reward distributions and a real-world dataset and application.
Irene Ko, Sihui Dai, et al.
NeurIPS 2024
Henrik Nolte, Miriam Rateike, et al.
FAccT 2025
Katelyn Morrison, Zahra Ashktorab, et al.
AAAI 2025
George Kour, Itay Nakash, et al.
ACL 2025