Saurabh Paul, Christos Boutsidis, et al.
JMLR
This paper presents a case study in which the TD(A) algorithm for training connectionist networks, proposed in (Sutton, 1988), is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, networks are able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. The hidden units in these network have apparently discovered useful features, a longstanding goal of computer games research. Furthermore, when a set of handcrafted features is added to the input representation, the resulting networks reach a near-expert level of performance, and have achieved good results in tests against world-class human play.
Saurabh Paul, Christos Boutsidis, et al.
JMLR
Joxan Jaffar
Journal of the ACM
Rakesh Mohan, Ramakant Nevatia
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cristina Cornelio, Judy Goldsmith, et al.
JAIR