We present methods to control the lexicon size when learning a Combinatory Categorial Grammar semantic parser. Existing methods incrementally expand the lexicon by greedily adding entries, considering a  single training datapoint at a time. We propose using corpus-level statistics for lexicon learning decisions. We introduce voting to globally consider adding entries to the lexicon, and pruning to remove entries no longer required to explain the training data. Our methods result in state-of-the-art performance on the task of executing sequences of natural language instructions, achieving up to 25% error reduction, with lexicons that are up to 70% smaller and are qualitatively less noisy.

Powered by liveSite