[Corpora-List] comparable corpora in multiple language

Vladimír Benko vladob at juls.savba.sk
Tue Feb 10 11:31:17 CET 2015


Dear Roi,


>
> I am looking for comparable corpora in as many languages as possible,
> but most importantly in English, Italian, German and Russian. The
> corpora should be suitable for vector space modeling including NN
> training (i.e. having Gigas of words). We have already experimented
> with Wikipedia so we are looking for additional corpora.
>

you may want to consider the gigaword web corpora created within the framework of our Aranea project:

http://ucts.uniba.sk/aranea_about/

Best regards,

Vlado B, 10:30

-- Vladimír Benko

Slovak Academy of Sciences Ľ. Štúr Institute of Linguistics Panská 26, SK-81101 Bratislava

Tel +421-2-54431762 Fax -54431756



More information about the Corpora mailing list