We present novel methods to construct compact natural language lexicons within
a graph-based semi-supervised learning framework, an attractive platform suited
for propagating soft labels onto new natural language types from seed data. To
achieve compactness, we induce sparse measures at graph vertices by
incorporating sparsity-inducing penalties in Gaussian and entropic pairwise
Markov networks constructed from labeled and unlabeled data. Sparse measures
are desirable for high-dimensional multi-class learning problems such as the
induction of labels on natural language types, which typically associate with
only a few labels. Compared to standard graph-based learning methods, for two
lexicon expansion problems, our approach produces significantly smaller
lexicons and obtains better predictive performance.

Powered by liveSite