[Corpora-List] Distribution-aware generation of nounn phrases

ngonga at informatik.uni-leipzig.de ngonga at informatik.uni-leipzig.de
Tue Aug 28 08:37:29 CEST 2012


Dear all,

I'm doing some work on computing the similarity of entity labels (i.e., noun phrases) in different languages and would like to study the behavior of several approaches that aim to achieve this goal on corpora of increasing sizes. I would thus like to generate corpora and in a way that reflects the distribution of the n-grams in each of the languages of reference. Is anyone aware of a (preferably Java) tool that does something similar?

Cheers, Axel

---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.



More information about the Corpora mailing list