Cross-language information propagation for arabic mention detection
Abstract
In the last two decades, significant effort has been put into annotating linguistic resources in several languages. Despite this valiant effort, there are still many languages left that have only small amounts of such resources. The goal of this article is to present and investigate a method of propagating information (specifically mention detection) from a resource-rich language into a relatively resource-poor language such as Arabic. Part of the investigation is to quantify the contribution of propagating information in different conditions based on the availability of resources in the target language. Experiments on the language pair Arabic-English show that one can achieve relatively decent performance by propagating information from a language with richer resources such as English into Arabic alone (no resources or models in the source language Arabic). Furthermore, results show that propagated features from English do help improve the Arabic system performance even when used in conjunction with all feature types built from the source language. Experiments also show that using propagated features in conjunction with lexically derived features only (as can be obtained directly from a mention annotated corpus) brings the system performance at the one obtained in the target language by using feature derived from many linguistic resources, therefore improving the system when such resources are not available. In addition to Arabic-English language pair, we investigate the effectiveness of our approach on other language pairs such as Chinese-English and Spanish-English. © 2009 ACM.