[Corpora-List] Release of a German-English Parallel Corpus

manaal faruqui manaalfar at gmail.com
Mon May 14 17:12:22 CEST 2012


Hello everyone,

We would like to announce the distribution of a German-English parallel corpus of 18th/19th century literary texts.

The corpus has been constructed from a total of 106 public-domain novels and stories, mostly 19th-century texts collected from the Project Gutenberg website. The texts are available for research purposes (see the website for details).

The texts are segmented into paragraphs, sentences and words, are aligned at the sentence level, and are POS-tagged and lemmatized in both languages.

Furthermore, the German sentences are labeled with T/V (formality) information on the basis of pronoun information which has been copied onto the English side. See our paper (Manaal Faruqui and Sebastian Pado, "Towards a model of formal and informal address in English" presented at EACL-2012) for details.

For the corpus, and more information, please see http://www.nlpado.de/~sebastian/data/tv_data.shtml.

Regards, Manaal Faruqui

Manaal Faruqui | Final Year Dual Degree | Computer Science and Engg | IIT Kharagpur

Website: http://cse.iitkgp.ac.in/~manaalf<http://cse.iitkgp.ac.in/%7Eashisy> Mobile: +91-9932900944 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5205 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120514/f5b10d5f/attachment.txt>



More information about the Corpora mailing list