[Corpora-List] Spanish reference corpus

Adam Kilgarriff adam at lexmasterclass.com
Fri Feb 2 08:55:00 CET 2007


Mario,

Yes, the frequencies etc are available for this corpus via the Sketch
Engine, a corpus query tool which allows the user to specify and collect
frequency lists to a wide range of specifications (as well as offering a
range of other functions including concordancing, 'word sketches' and a
distributional thesaurus).

We have taken the URL list as supplied by Serge Sharoff, re-collected the
corpus (or, at least, a 95% similar corpus) and installed it into the Sketch
Engine. Self-registration for trial account at
http://www.sketchengine.co.uk

Enjoy!

Adam

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Mario Crespo Miguel
Sent: 01 February 2007 13:17
To: s.sharoff at leeds.ac.uk
Cc: corpora at lists.uib.no
Subject: Re: [Corpora-List] Spanish reference corpus

Thank you very much for helping me, but I think it is more
convenient for me if the frequencies of the words of this open
domain / general corpus could be obtained. Does anybody know if
such an information is available some way? Best,

Mario



El dia 30 ene 2007 16:10, Serge Sharoff <s.sharoff at leeds.ac.uk>
escribió:


> one answer is the Spanish Internet corpus with the interface from

> http://corpus.leeds.ac.uk/internet.html

> and the URL list

> http://corpus.leeds.ac.uk/internet/final-url-es.gz

>

> This is a random snapshot of the Spanish Internet of about 120

> million

> words, see

> Sharoff, S (2006) Creating general-purpose corpora using

> automated

> search engine queries. In Marco Baroni and Silvia Bernardini,

> editors,

> WaCky! Working papers on the Web as Corpus. Gedit, Bologna.

> http://wackybook.sslmit.unibo.it/

>

> S

>

> On Tue, 2007-01-30 at 15:54 +0100, Mario Crespo Miguel wrote:

>> Dear everybody,

>>

>> Thank you again for all the help that I always get with this

>> mailing list, and this time I would like to ask if there is

>> some reference / open-domain corpus for Spanish which is freely

>> available and could be downloaded. Thank you in advance. Best

>> wishes,

>>

>> Mario Crespo Miguel

>>

>>

>

>











More information about the Corpora-archive mailing list