Conference paper

THEMES: An Offline Apprenticeship Learning Framework for Evolving Reward Functions

Abstract

Apprenticeship learning (AL) aims to induce decision-making policies by observing and imitating expert demonstrations. Existing AL approaches typically rely on online interactions and assume that the demonstrations follow a single reward function. Nevertheless, in real-world human-centric applications, policies are usually learned in an offline setting, with the demonstrations driven by multiple reward functions that evolve over time. To address these challenges, we introduce a novel AL framework: Time-aware Hierarchical EM Energy-based Sub-trajectory THEMES clustering. We evaluate the effectiveness of THEMES in two challenging human-centric domains - healthcare and education. Our experimental results across multiple datasets demonstrate that THEMES can accurately induce policies, outperforming competitive baselines and ablations, demonstrating its potential for tackling a broad range of complex, real-world human-centric tasks.