To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that
consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets
to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset
consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1)
compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard
part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.

Powered by liveSite