[Corpora-List] Romanian corpus

Eckhard Bick eckhard.bick at mail.dk
Mon Jul 2 15:02:17 CEST 2007


I would like to announce the completion of a grammatically annotated
Romanian corpus at http://corp.hum.sdu.dk

The corpus covers the business language domain and has a size of 21.4
million words (27 million tokens). It was compiled by Arina Greavu
(arinagreavu at yahoo.com) from news text sources, and annotated with (a)
PoS and morphology using Dan Tufis' tagger
(http://www.infoiasi.ro/bin/view/Structure/tufis), as well as (b)
syntactic function and shallow dependency markers using a Constraint
Grammar system at VISL
(http://beta.visl.sdu.dk/constraint_grammar.html). Both text and
annotation can be searched password-free through a menu-based interface.
However, search results will be in concordance style, not running text
or entire articles.

Best regards,
Eckhard Bick

Eckhard Bick,
cand.med., dr.phil.
University of Southern Denmark
e-mail: eckhard.bick at mail.dk
web: http://beta.visl.sdu.dk

More information about the Corpora-archive mailing list