Dialogue modeling via hash functions
Abstract
We propose a novel machine-learning framework for dialogue modeling which uses representations based on hash functions. More specifically, each person's response is represented by a binary hashcode where each bit reflects presence or absence of a certain text pattern in the response. Hashcodes serve as compressed text representations, allowing for efficient similarity search. Moreover, hashcode of one person's response can be used as a feature vector for predicting the hashcode representing another person's response. The proposed hashing model of dialogue is obtained by maximizing a novel lower bound on the mutual information between the hashcodes of consecutive responses. We apply our approach in psychotherapy domain evaluating its effectiveness on a real-life dataset consisting of therapy sessions with patients suffering from depression; in addition, we also model transcripts of interview sessions between Larry King (television host) and his guests.