I am looking for comparable corpora in as many languages as possible, but most importantly in English, Italian, German and Russian. The corpora should be suitable for vector space modeling including NN training (i.e. having Gigas of words). We have already experimented with Wikipedia so we are looking for additional corpora.
Thanks, Roi -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 671 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150207/b10ee6dc/attachment.txt>