[Corpora-List] quantities of publicly available parallel text?

maxwell at umiacs.umd.edu maxwell at umiacs.umd.edu
Wed Feb 27 16:15:47 CET 2008

Alexandre Rafalovitch wrote:
> I have more information available, if somebody takes an interest.

I would be very interested in having some place to go to for information of this sort, whether general (like my previous msg on this thread) or specific to a particular language. The LDC cataloged the information on "found" resources for a few LoDLs, but the page seems to have been taken away. I have some pointers at http://www.netvouz.com/mcswell/folder/4234597228659620420/Languages, but I have not attempted to keep it up-to-date, and in most cases I don't know the languages, so some of the pointers are questionable. A number of other people have made similar catalogs, some of which are pointed to from my page.

OLAC is of course another catalog, but it's really intended to catalog resources available from formal archiving institutions, I believe. While it would be good if everyone put their resources in such places, it isn't happening, and the archives might be overwhelmed if everyone started doing this.

One place where general information on resources (both found and created) could be tracked--maybe the best--is the ACL wiki: http://aclweb.org/aclwiki/index.php?title=List_of_resources_by_language

Mike Maxwell


More information about the Corpora mailing list