Global Explanations for Multivariate time series models
Vijay Arya, Diptikalyan Saha, et al.
CODS-COMAD 2023
For large-vocabulary handwriting-recognition applications, such as note-taking, word-level language modeling is of key importance, to constrain the recognizer's search and to contribute to the scoring of hypothesized texts. We discuss the creation of a word-unigram language model, which associates probabilities with individual words. Typically, such models are derived from a large, diverse text corpus. We describe a three-stage algorithm for determining a word unigram from such a corpus. First is tokenization, the segmenting of a corpus into words. Second, we select for the model a subset of the set of distinct words found during tokenization. Complexities of these stages are discussed. Finally, we create recognizer-specific data structures for the word set and unigram. Applying our method to a 600-million-word corpus, we generate a 50,000-word model which eliminates 45% of word-recognition errors made by a baseline system employing only a character-level language model. © 2001 IEEE.
Vijay Arya, Diptikalyan Saha, et al.
CODS-COMAD 2023
R.A. Gopinath, Markus Lang, et al.
ICIP 1994
Rogerio Feris, Lisa M. Brown, et al.
ICPR 2014
Sharat Chikkerur, Venu Govindaraju, et al.
WACV 2005