[Corpora-List] Looking for Corpora in: English, Swedish, Polish, Italian, Finnish, Estonian, Hungarian

Ralf Steinberger ralf.steinberger at jrc.ec.europa.eu
Sun Mar 23 18:12:13 CET 2014

Dear Marina,

At the JRC's Language Technology page http://ipsc.jrc.ec.europa.eu/index.php?id=61, you find parallel corpora for all the languages you are searching for, and more.

All the best,


Ralf Steinberger

European Commission - Joint Research Centre (JRC)

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Marina Santini Sent: 23 March 2014 15:26 To: corpora at uib.no; Marina Santini Subject: [Corpora-List] Looking for Corpora in: English, Swedish, Polish, Italian, Finnish, Estonian, Hungarian


I am looking for corpora of any genre in the following languages: English, Swedish, Polish, Italian, Finnish, Estonian, and Hungarian. I am already aware of a number of corpora (several posts in the WebGenre blog are dedicated to the dissemination of corpora-related information). These corpora, though, are mostly in English. I would like now to focus on: 1) additional languages and 2) additional genres, such as search query logs, tv scripts, emails, tweets, whatsup messages, etc. All genres are well accepted! The only requirement is: corpora must be free and publicly available. Everybody must be able to replicate or extend experiments using the same corpora/datasets.

The purpose of the experiments is to explore cross-linguality in different settings. Please, read the use cases in the blog post to have an idea of the type of communicative situations under investigation (http://www.forum.santini.se/2014/03/looking-for-corpora-to-explore-cross-li nguality/)

Thanx in advance for your suggestions and pointers.


Marina Santini


<http://www.linkedin.com/groups/WebGenre-R-D-Group-4301498> http://www.linkedin.com/groups/WebGenre-R-D-Group-4301498

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6174 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20140323/5b401dc0/attachment.txt>

More information about the Corpora mailing list