[Corpora-List] Corpora Digest, Vol 5, Issue 27

René Venegas rene.venegas at ucv.cl
Wed Nov 28 15:33:00 CET 2007


Dear Mario

You will find a tagged Spanish Corpus in www.elgrial.cl, you can make morphosyntatic querys and it is free to use for research purposes.

Dr. René Venegas Profesor Programa de Postgrado en Lingüística www.postgradolinguistica.ucv.cl/rene www.linguistica.cl www.elgrial.cl

Instituto de Literatura y Ciencias del Lenguaje www.ilcl.ucv.cl

Pontificia Universidad Católica de Valparaíso www.ucv.cl

Asistente Revista Signos. Estudios de Lingüística www.scielo.cl/signos.htm www.revistasignos.cl

-----Mensaje original----- De: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] En nombre de corpora-request at uib.no Enviado el: miércoles, 28 de noviembre de 2007 11:00 Para: corpora at uib.no Asunto: Corpora Digest, Vol 5, Issue 27

Today's Topics:

1. frequency dictionary of verbs (Jesús Fernández)

2. Spanish corpus (Mario Crespo Miguel)

3. Spanish corpus (Valerie Mapelli)

4. DGT-TM - Translation Memory for 231 language pairs available

for distribution (Ralf Steinberger)

----------------------------------------------------------------------

Message: 1 Date: Tue, 27 Nov 2007 15:27:05 +0100 From: "Jesús Fernández" <jesusferdom_AT_gmail.com> Subject: [Corpora-List] frequency dictionary of verbs To: CORPORA_AT_uib.no

Dear David, Adam, Jennifer and Suzan,

Thank you so much for your replies, you have got straight to the point even if I was not too specific on the requirements.

Below are the links which you have provided in case someone else is interested:

- http://ota.ahds.ac.uk/ <http://ota.ahds.ac.uk/> (The Oxford Text Archive) - <http://www.sketchengine.co.uk/> http://www.sketchengine.co.uk (generation of frequency lists from English corpora) - http://www.comp.lancs.ac.uk/ucrel/bncfreq/lists/5_2_all_rank_verb.txt(freque ncy list of verbs by lemma, from Leech, Rayson & Wilson 2001)

Best, Jesús.

------------------------------

Message: 2 Date: Wed, 28 Nov 2007 09:54:58 +0100 (CET) From: Mario Crespo Miguel <mario.crespo_AT_uca.es> Subject: [Corpora-List] Spanish corpus To: CORPORA_AT_UIB.NO

Dear all,

I wonder if anyone on the list knows if there is available a syntactically and/or morphologically tagged corpus of Spanish that could be purchased or obtained for research purposes. Thank you very much in advance,

best

Mario Crespo Miguel

------------------------------

Message: 3 Date: Wed, 28 Nov 2007 10:03:03 +0100 From: Valerie Mapelli <mapelli_AT_elda.org> Subject: [Corpora-List] Spanish corpus To: Mario Crespo Miguel <mario.crespo_AT_uca.es>,CORPORA_AT_UIB.NO

Dear Mario,

You may be interested in the MULTEXT JOC Corpus which includes morpho-syntactic annotation available on the ELRA catalogue: http://catalog.elra.info/product_info.php?products_id=534

The CRATER Corpus could also suit your needs: http://catalog.elra.info/product_info.php?products_id=84

Please do not hesitate to contact me for any further information.

Best regards,

Valerie Mapelli

At 09:54 28/11/2007, Mario Crespo Miguel wrote:
>Dear all,
>
>I wonder if anyone on the list knows if there is available a
>syntactically and/or morphologically tagged corpus of Spanish that
>could be purchased or obtained for research purposes. Thank you
>very much in advance,
>
>best
>
>Mario Crespo Miguel
>
>
>_______________________________________________


>Corpora mailing list
>Corpora_AT_uib.no
>http://mailman.uib.no/listinfo/corpora

------------------------------

Message: 4 Date: Wed, 28 Nov 2007 14:48:09 +0100 From: Ralf Steinberger <ralf.steinberger_AT_jrc.it> Subject: [Corpora-List] DGT-TM - Translation Memory for 231 language

pairs available for distribution To: CORPORA_AT_uib.no

This is a multi-part message in MIME format.

Apologies for cross-postings.

DGT-TM Translation Memory

Freely available

22 languages

231 language pairs

Format: TMX version 1

<http://langtech.jrc.it/DGT-TM.html> http://langtech.jrc.it/DGT-TM.html

The European Commission's Directorate General for Translation (DGT) and the Joint Research Centre (JRC) have made available a multilingual Translation Memory (sentences and their translations, in standard TMX format) for the 22 official European Union languages Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish.

This release follows the public release - in May 2006 - of the <http://langtech.jrc.it/JRC-Acquis.html> JRC-Acquis multilingual parallel corpus with sentence alignment for 231 language pairs and a total size of over 1 Billion words.

The data releases of DGT and JRC are in line with the general effort of the European Commission to support multilingualism, language diversity and the re-use of Commission information.

The Translation Memory contains most, but not all of the Acquis Communautaire, which is the entire body of European legislation, including all the treaties, regulations and directives adopted by the European Union (EU) and the rulings of the European Court of Justice. Since each new country joining the EU is required to accept the whole Acquis Communautaire, this body of legislation is translated into 22 official EU languages. For the 23rd official EU language, Irish, the Acquis is not translated on a regular basis.

A translation memory is a collection of small text segments and their translation. These segments can be sentences or sentence parts. Translation memories are used to support translators by ensuring that pieces of text that have already been translated do not need to be translated again.

Both translation memories and parallel texts are an important linguistic resource that can be used for a variety of purposes, including:

training automatic systems for Statistical Machine Translation (SMT);

producing monolingual or multilingual lexical and semantic resources such as dictionaries and ontologies;

training and testing multilingual information extraction software;

checking translation consistency automatically;

testing and benchmarking alignment software (for sentences, words, etc.).

For usage conditions, details regarding the difference between <http://langtech.jrc.it/DGT-TM.html> DGT-TM and the <http://langtech.jrc.it/JRC-Acquis.html> JRC-Acquis, size information, downloading instructions, etc. go to <http://langtech.jrc.it/DGT-TM.html> http://langtech.jrc.it/DGT-TM.html.

Achim Blatt

Directorate General for Translation (DGT)

Unit DGT.R.3 Informatics ( <http://ec.europa.eu/dgs/translation/> http://ec.europa.eu/dgs/translation/)

Ralf Steinberger European Commission - Joint Research Centre (JRC) IPSC - SeS - Language Technology ( <http://langtech.jrc.it/> http://langtech.jrc.it)

The JRC's Language Technology group specialises in the development of highly multilingual text analysis tools and in cross-lingual applications. Many applications are accessible online, e.g.:

<http://press.jrc.it/NewsExplorer/> NewsExplorer: multilingual news aggregation and analysis (19 languages); allows to navigate the news over time and across languages; trend analysis; collects information about people from the news; social network detection.

<http://press.jrc.it/> NewsBrief: breaking news detection and display of the very latest thematic news from around the world; email alerting (22+ languages).

<http://medusa.jrc.it/> MedISys Medical Information System: latest health-related news from around the world according to themes and diseases (22+ languages).

---------------------------------------------------------------------- Send Corpora mailing list submissions to

corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit

http://mailman.uib.no/listinfo/corpora or, via email, send a message with subject or body 'help' to

corpora-request at uib.no

You can reach the person managing the list at

corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific than "Re: Contents of Corpora digest..."

_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora

End of Corpora Digest, Vol 5, Issue 27 **************************************



More information about the Corpora mailing list