[Corpora-List] Developing Parallel Corpus

Darren Cook darren at dcook.org
Fri Apr 18 12:32:43 CEST 2014

> I am currently doing MS, and for my final research I wanted to develop
> the parallel corpus. I have translation of source and target language.
> What else I have to do in order to develop the parallel corpus? Should I
> have to tokenize this data? or any other processing on this text?

An article [1] in the first issue of Journal of Language Modelling gave a nice overview of what corpora and parallel corpora cover. (I just happened to have read and enjoyed this article recently, which is why it came to mind; I'm sure there are other articles on the subject to be found.)


[1]: http://jlm.ipipan.waw.pl/index.php/JLM/article/view/33 The Bulgarian National Corpus: Theory and Practice in Corpus Design

-- Darren Cook, Software Researcher/Developer My new book: Data Push Apps with HTML5 SSE Published by O'Reilly: (ask me for a discount code!)

http://shop.oreilly.com/product/0636920030928.do Also on Amazon and at all good booksellers!

More information about the Corpora mailing list