[Corpora-List] Looking for Corpora in: English, Swedish, Polish, Italian, Finnish, Estonian, Hungarian

Edyta Jurkiewicz-Rohrbacher edytaj at gmail.com
Sun Mar 23 18:50:04 CET 2014


Dear Marina,

you can get an access to quite a decent corpora of Finnish from The Language Bank of Finalnd. For that, however, you would need to register (which is pretty simple), link here: http://www.csc.fi/english/research/sciences/linguistics/index_html Other options are: -Corpus of Institute for the languages of Finland, which contains also some older texts

http://kaino.kotus.fi/korpus/meta/korpus_coll_rdf.xml - project Gutenberg.

In case of Polish, there is the National Corpus of Polish: http://nkjp.pl/index.php?page=11&lang=1

Some other ideas for finding texts you might get checking OPUS http://opus.lingfil.uu.se/

Interkorp: http://ucnk.ff.cuni.cz/intercorp/ and

ParaSol: http://parasol.unibe.ch/

which are quite massive multi-lingual corpora.

All the best, Edyta Jurkiewicz-Rohrbacher

2014-03-23 18:12 GMT+01:00 Ralf Steinberger < ralf.steinberger at jrc.ec.europa.eu>:


> Dear Marina,
>
>
>
> At the JRC's Language Technology page
> http://ipsc.jrc.ec.europa.eu/index.php?id=61, you find parallel corpora
> for all the languages you are searching for, and more.
>
>
>
> All the best,
>
>
>
> Ralf
>
>
>
> *Ralf Steinberger*
>
> European Commission - Joint Research Centre (JRC)
>
>
>
> *From:* corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] *On Behalf
> Of *Marina Santini
> *Sent:* 23 March 2014 15:26
> *To:* corpora at uib.no; Marina Santini
> *Subject:* [Corpora-List] Looking for Corpora in: English, Swedish,
> Polish, Italian, Finnish, Estonian, Hungarian
>
>
>
> Hi,
>
>
> I am looking for corpora of any genre in the following languages: English,
> Swedish, Polish, Italian, Finnish, Estonian, and Hungarian.
> I am already aware of a number of corpora (several posts in the WebGenre
> blog are dedicated to the dissemination of corpora-related information).
> These corpora, though, are mostly in English. I would like now to focus on:
> 1) additional languages and 2) additional genres, such as search query
> logs, tv scripts, emails, tweets, whatsup messages, etc.
> All genres are well accepted! The only requirement is: corpora must be
> free and publicly available. Everybody must be able to replicate or extend
> experiments using the same corpora/datasets.
>
> The purpose of the experiments is to explore cross-linguality in different
> settings. Please, read the use cases in the blog post to have an idea of
> the type of communicative situations under investigation (
> http://www.forum.santini.se/2014/03/looking-for-corpora-to-explore-cross-linguality/
> )
>
>
> Thanx in advance for your suggestions and pointers.
>
> --
>
> Marina Santini
>
> http://www.forum.santini.se
> http://www.linkedin.com/groups/WebGenre-R-D-Group-4301498
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7520 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20140323/b7b9dece/attachment.txt>



More information about the Corpora mailing list