Zeta hull pursuits: Learning nonconvex data hulls
Yuanjun Xiong, Wei Liu, et al.
NeurIPS 2014
Cross-language learning allows one to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are coherent between two languages, while not relying on word-level alignments. We show that by simply learning to reconstruct the bag-of-words representations of aligned sentences, within and between languages, we can in fact learn high-quality representations and do without word alignments. We empirically investigate the success of our approach on the problem of cross-language text classification, where a classifier trained on a given language (e.g., English) must learn to generalize to a different language (e.g., German). In experiments on 3 language pairs, we show that our approach achieves state-of-the-art performance, outperforming a method exploiting word alignments and a strong machine translation baseline.
Yuanjun Xiong, Wei Liu, et al.
NeurIPS 2014
David P. Woodruff
NeurIPS 2014
Marek Petrik, Dharmashankar Subramanian
NeurIPS 2014
Sarath Chandar, Mitesh M. Khapra, et al.
Neural Computation