A robust chunker can drastically reduce
the complexity of parsing of natural
language text. Chunking for Indian languages require a novel approach because
of the relatively unrestricted order of
words within a word group. A computational framework for chunking based
on valency theory and feature structures
has been described here. The paper also
draws an analogy of chunk formation
in free word order languages with the
bonding of atoms, radicals or molecules
to form complex chemical structures. The
unavailability of large annotated corpora
forces one to adopt a statistical approach
to achieve the task of word grouping
for Indian language text. A chunker
has been implemented for Bengali using
this approach with considerably good
accuracy.

Powered by liveSite