Christopher S. Campbell, Paul P. Maglio
Int. J. Hum. Comput. Stud.
We present techniques for improving domain-specific translation quality with a relatively high OOV ratio on test data sets. The key idea is to maximize the vocabulary coverage without degrading the translation quality. We maximize vocabulary coverage by segmenting a word into a sequence of morphemes, prefix*-stem-suffix* and by adding a large amount of out-of-domain training corpora. To preserve the domain-specific meaning of vocabularies occurring in both domain-specific and out-of-domain training corpora, we assign a higher weight to the domain-specific corpus than to the out-of-domain corpora. IBM Arabic-to-English spoken language translation systems using these techniques have demonstrated the best performances in the Open Data Track of the IWSLT2006 Evaluation Campaign.
Christopher S. Campbell, Paul P. Maglio
Int. J. Hum. Comput. Stud.
Jakita O. Thomas, Eric Mibuari, et al.
CHI 2011
Luís Henrique Neves Villaça, Sean Wolfgand Matsui Siqueira, et al.
SBSI 2023
James Fogarty, Scott E. Hudson, et al.
CHI 2004