Manual


This is an effort to build a multilingual corpus of CCG annotations for evaluating derivation projection algorithms and cross-lingually trained parsers.

We opt here for a flavor of CCG closely following CCGrebank. The reason for this choice is that a large volume of training data in this format (for English) is available. Having test data (in multiple languages) in a similar format will enable a direct comparison.

Of course, the annotation format for CCGrebank has only been specified for English. It is not always obvious how to apply it to other languages. This manual gives guidelines, alphabetically sorted by the linguistic phenomenon in question. It was compiled by the first author based on:

  1. corresponding examples from CCGrebank, where available
  2. decisions taken during discussions among the author/annotators prior to the main round of annotation
  3. decisions made by the majority of annotators during the main round of annotation