Conference paper
Semantic tokenization of verbalized numbers in language modeling
Xiaoqiang Luo, Martin Franz
ICSLP 2000
Search algorithms in most current text retrieval systems use index data structures extracted from the original text documents. In this paper we focus on reducing the size of the indices by reducing the amount of space dedicated to store term frequencies. In experiments using TREC Ad Hoc [2, 3] corpora and query sets, we show that it is possible to store the term frequency in only two bits without decreasing retrieval performance.
Xiaoqiang Luo, Martin Franz
ICSLP 2000
Yulong Li, Martin Franz, et al.
NAACL 2022
Martin Franz, J. Scott McCarley, et al.
INTERSPEECH - Eurospeech 2001
Martin Franz, Bhuvana Ramabhadran, et al.
INTERSPEECH - Eurospeech 2003