[Corpora-List] POS-tagger maintenance and improvement

Andras Kornai andras at kornai.com
Thu Feb 26 21:50:18 CET 2009


On Thu, Feb 26, 2009 at 09:33:45PM +0100, Francis Tyers wrote:
> It does not allow derivative works. So for example if I want to take the
> corpus and add some fancy new markup to it, I could not redistribute it[1] under a
> free software licence (BSD, LGPL, GPL, ...) for others to benefit.
>
> 1. For example put it in a public revision control system.

Yes, absolutely true, you can't redistribute LDC corpora. (I think we actually retained the right for us to distribute Hunglish, but so far had no reason to exersise it.) However, to get back to the main point, if you spotted errors and created diffs, a clearinghouse could hold the diff in CVS (this is your work, and is clearly de minimis, so you can LGPL or BSD license it), making it trivial for future users to pull this down and patch the corpus. A repository of this sort makes good sense, if you(all) have patches you are willing to contribute drop me a line, maybe we will set something up after all.

Andras Kornai



More information about the Corpora mailing list