[Corpora-List] A list of German N+N compounds and Licensing Questions

Yannick Versley versley at cl.uni-heidelberg.de
Wed Aug 26 14:58:55 CEST 2015

Dear Liling,

if you look at similar resources, you see that you have a number of choices:

The Moby Thesaurus II, a list of English words with synonyms and related terms, was placed in the public domain: https://en.wikipedia.org/wiki/Moby_Project#Thesaurus

If you want to formally put a dataset in the public domain, there are the *CC0* license and the *Unlicense*: http://choosealicense.com/licenses/cc0-1.0/ http://choosealicense.com/licenses/unlicense/ their main purpose is to tell people that they can do whatever they want with it but are not entitled to sue you when they burn a hole in their jacket in the process of doing so.

If you create something more substantial, like the ZMorge or Lefff lexicon, you may want to use a copyleft license such as *CC-BY-SA* or *LGPL-LR*, which ensure that improvements to the lexicon can always be incorporated back into the original resource: http://kitt.ifi.uzh.ch/kitt/zmorge/ http://alpage.inria.fr/~sagot/lefff-en.html https://creativecommons.org/licenses/by-sa/3.0/deed.en http://infolingu.univ-mlv.fr/DonneesLinguistiques/Lexiques-Grammaires/lgpllr.html

Onto the distinction between own creation and "somehow compiled": any creative endeavour that surpasses a certain triviality criterion can be subject to *copyright protection*. Compiling a list of compounds may or may not pass this criterion. Compiling a list of quotable citations may give rise to a "*database protection*" where any single item does not enjoy protection, but the ensemble of all the items do.

Similarly, there is (in almost all copyright laws) an exemption for people using* minuscule portions* of someone else's work for quoting or criticism. Trigger-happy IP licensing agencies have rendered this a bit of a murky area, but most people agree that collecting small bits of text (i.e., small quantity as well as insubstantial portion of the whole work) for a noncommercial purpose is not something for which you need a license to the text(s). [related link] http://ipbreakdown.com/blog/a-requiem-for-a-lawsuit-signifying-nothing-de-minimis-and-fair-usea-requiem-for-a-lawsuit-signifying-nothing-de-minimis-and-fair-use/

Best wishes, Yannick

On Wed, Aug 26, 2015 at 2:06 PM, liling tan <alvations at gmail.com> wrote:

> Dear Corpora community,
> Sorry the link for the list was scrubbed by the mail server, try
> https://raw.githubusercontent.com/alvations/DLTK/master/N%2BN%20(Ohne%20Fugenelement)
> or https://goo.gl/7DpFJX
> Best Regards,
> Liling
> On Wed, Aug 26, 2015 at 2:02 PM, liling tan <alvations at gmail.com> wrote:
>> Dear Corpora researchers/enthusiasts,
>> I have somehow compiled a list of N+N compounds for German compositas:
>> https://github.com/alvations/DLTK/blob/master/N%2BN%20(Ohne%20Fugenelement)
>> I seek the corpora community help in understanding how to license a
>> lexicon, list or corpora that was compiled without a single or several
>> primary sources and mainly generated by a sort of armchair linguist.
>> And how could I substantiate an open license for data that is somehow
>> created? Like armchair linguists, they sit and think of examples and if
>> they did license their examples or vocabulary or glossary or corpora, how
>> did they substantiate the license.
>> The source of the list was from my own learning as I read online
>> materials and listen to how people talk on the street. How can we
>> open-source such materials compiled? Who holds the copyrights to such a
>> list?
>> Previously, there was another corpus that was "somehow compiled":
>> https://github.com/alvations/Quotables and since it's a list of
>> quotations who holds the copyrights to those quotes? Ideally, the person
>> who says it holds the copyrights but many of them are deceased.
>> Best Regards,
>> Liling
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6414 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150826/ea1928e0/attachment.txt>

More information about the Corpora mailing list