[Corpora-List] Announcing the French WaCky corpus: frWaC

Janne Bondi Johannessen jannebj at iln.uio.no
Sat Apr 17 11:07:46 CEST 2010

Congratulations! I would also like to mention that we have developed a Norwegian web corpus: NoWaC v 1.0. The computational procedure used to collect the NoWaC corpus is largely based on the techniques used to build the corpora published by the WaCky initiative <http://wacky.sslmit.unibo.it/>. The NoWaC corpus was developed by Emiliano Guevara.

Search the corpus: http://www.tekstlab.uio.no/nowac/ Read about it here: http://www.hf.uio.no/tekstlab/nowac.html

It will be properly announced later.

Best, Janne Bondi Johannessen.

2010/4/8 Adriano Ferraresi <adriano at sslmit.unibo.it>

> Dear corpora members,
> we are happy to announce that we've recently completed work on frWaC, a new
> corpus resource for French.
> Like deWaC (for German), itWaC (for Italian) and ukWaC (for English), frWaC
> is a mega-corpus (~ 1.6 billion words) obtained by crawling and
> post-proccesing Web data. It is available both in a plain text version, and
> in an annotated version, which includes Part-of-Speech and lemma
> information. An earlier version of the corpus, and the procedure for its
> construction, are described here:
> Ferraresi, A., S. Bernardini, G. Picci and M. Baroni (2010) “Web Corpora
> for Bilingual Lexicography: A Pilot Study of English/French Collocation
> Extraction and Translation”. In Xiao, R. (ed.) Using Corpora in Contrastive
> and Translation Studies. Newcastle: Cambridge Scholars Publishing.
> For more details on the corpus and how to obtain it, please visit the WaCky
> project website:
> http://wacky.sslmit.unibo.it/
> Best,
> The WaCkies
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- Janne Bondi Johannessen Professor, The Text Laboratory, ILN, http://www.hf.uio.no/tekstlab/ President, NEALT, http://omilia.uio.no/nealt/ University of Oslo P.O.Box 1102 Blindern, N-0317 Oslo, Norway Tel: +47 22 85 68 14, mob.: +47 928 966 34 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2867 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20100417/c82fff09/attachment.txt>

More information about the Corpora mailing list