Previous: Develop the Corpus Up: Develop the Corpus Next: Format the corpus

Define the scope of the corpus
Identify any structural limitations (e.g., no texts under n characters should be included, non-English texts should be excluded, etc). Consider whether the entire corpus or merely a subset will be hand-coded. In the latter case, identify how the subset should be selected (randomly, representatively, etc.)



Gary King 2011-07-12