We describe a novel approach for inducing unsupervised part-of-speech
taggers for languages that have no labeled training data, but have
translated text in a resource-rich language. Our method does not
assume any knowledge about the target language (in particular no
tagging dictionary is assumed), making it applicable to a wide array
of resource-poor languages. We use graph-based label propagation for
cross-lingual knowledge transfer and use the projected labels as
features in an unsupervised model (Berg-Kirkpatrick et al. 2010).
Across eight European languages, our approach results in an average
absolute improvement of 10.4% over a state-of-the-art baseline, and
16.7% over vanilla hidden Markov models induced with the Expectation
Maximization algorithm.
Powered by liveSite