 - get frequencies for sequences of words, ie: sequences of 2, 3, 4 words ...

 - Make processing of chars independent of localisation.

 - Optionally use Recode to translate different charsets ? 
 
