Over-splitting would increase the total word count, but reduce the count of unique words. The huge number of unique words that Emmanuel Prochasson found was probably the result of grouping long Kanji strings into a single so-called noun.
For example, English 'life insurance company employee' would count as 4 words, but the German 'Lebensversicherungsgesellschaftsangestellter' would be counted as just one word.
John Sowa