Leveraging Wikipedia Characteristics for Search and Candidate Generation in Question Answering
Abstract
Most existing Question Answering (QA) systems adopt a type-and-generate approach to candidate generation that relies on a pre-defined domain ontology. This paper describes a type independent search and candidate generation paradigm for QA that leverages Wikipedia characteristics. This approach is particularly useful for adapting QA systems to domains where reliable answer type identification and type-based answer extraction are not available. We present a three-pronged search approach motivated by relations an answer-justifying title-oriented document may have with the question/answer pair. We further show how Wikipedia metadata such as anchor texts and redirects can be utilized to effectively extract candidate answers from search results without a type ontology. Our experimental results show that our strategies obtained high binary recall in both search and candidate generation on TREC questions, a domain that has mature answer type extraction technology, as well as on Jeopardy! questions, a domain without such technology. Our high-recall search and candidate generation approach has also led to high overall QA performance in Watson, our end-to-end system.