[Corpora-List] 回复:Corpora Digest, Vol 98, Issue 30

唐永力 happytyl1990 at 126.com
Tue Sep 1 17:56:41 CEST 2015


unsubscribe

发自网易邮箱大师 在2015年08月30日 18:00,corpora-request at uib.no 写道: Today's Topics:

1. Kurmanji Speech Corpus (Adel Rahimi)

2. Re: A list of German N+N compounds and Licensing Questions

(Lluís Padró)

3. deadline extension: LRE special issue on how to standardize

historical corpora (Eszter Simon)

4. Re: The global CQPweb family (Hardie, Andrew)

----------------------------------------------------------------------

Message: 1 Date: Sat, 29 Aug 2015 17:25:21 +0430 From: Adel Rahimi <s912162006 at edu.ikiu.ac.ir> Subject: [Corpora-List] Kurmanji Speech Corpus To: corpora at uib.no

Dear Corpora members,

We are a group of linguists working on Kurmanji speech processing. We are building the first Kurmanji Speech Corpus and throughout this project we need help. Transcribing, Translating and annotating hours of recordings is a tremendously hard task. We therefore ask any organizations, institutions, and individuals who can either help or provide resources on this project to contact us. any kind of help is appreciated. PS. You can see sample files at:http://adelra.github.io/ksc

Faithfully your, Adel Rahimi http://adelra.github.io <https://t.yesware.com/tl/cb313c6f5c8c981f16646713a39ca3481346f3cb/baca53a083cb7a21d109ed8cf88fa82c/3e181d61dc2d50ed355d826091bbfe56?ytl=http%3A%2F%2Fadelra.github.io> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1503 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150829/a23122b0/attachment.txt>

------------------------------

Message: 2 Date: Wed, 26 Aug 2015 15:31:59 +0200 From: Lluís Padró <padro at cs.upc.edu> Subject: Re: [Corpora-List] A list of German N+N compounds and

Licensing Questions To: corpora at uib.no

As with any other linguistic resource (e.g. a lexicon), licensing your list doesn't mean you own the words in it (they belong to the language and its speakers), you just own the resource itself, that is, the compilation of the list and its encoding in electronic or paper support.

So, if you created the list yourself, you own the copyright (i.e. the right to establish under which terms the list can be copied and reused). You do not owe copyright for the words you heard or read somewhere, because they are part of the language. Your work is compiling them, not creating them. So, you can claim rights on the list itself, but not in the individual words it contais.

If you want to release your list under an open source license, I recommend you have a look at Creative Commons licenses : www.creativecommons.org

Other possiblities are:

http://www.cnrtl.fr/lexiques/prolex/licence_lgpl-lr.php

http://www.cecill.info/licences/Licence_CeCILL-C_V1-en.html

On 26/08/15 14:02, liling tan wrote:
> Dear Corpora researchers/enthusiasts,
>
> I have somehow compiled a list of N+N compounds for German compositas:
> https://github.com/alvations/DLTK/blob/master/N%2BN%20(Ohne%20Fugenelement)
> <https://github.com/alvations/DLTK/blob/master/N%2BN%20%28Ohne%20Fugenelement%29>
>
> I seek the corpora community help in understanding how to license a
> lexicon, list or corpora that was compiled without a single or several
> primary sources and mainly generated by a sort of armchair linguist.
>
> And how could I substantiate an open license for data that is somehow
> created? Like armchair linguists, they sit and think of examples and
> if they did license their examples or vocabulary or glossary or
> corpora, how did they substantiate the license.
>
> The source of the list was from my own learning as I read online
> materials and listen to how people talk on the street. How can we
> open-source such materials compiled? Who holds the copyrights to such
> a list?
>
> Previously, there was another corpus that was "somehow compiled":
> https://github.com/alvations/Quotables and since it's a list of
> quotations who holds the copyrights to those quotes? Ideally, the
> person who says it holds the copyrights but many of them are deceased.
>
> Best Regards,
> Liling
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4294 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150826/04f6fc67/attachment.txt>

------------------------------

Message: 3 Date: Fri, 28 Aug 2015 13:57:29 +0200 From: Eszter Simon <simon.eszterke at gmail.com> Subject: [Corpora-List] deadline extension: LRE special issue on how

to standardize historical corpora To: corpora at uib.no

***** Apologies for cross-posting *****

We are inviting submissions for a Special Issue of the Language Resources and Evaluation Journal, entitled ?Converging Corpora: How to standardize historical corpora of typologically and genetically different languages?.

CALL FOR PAPERS

The availability of annotated language resources is becoming an increasingly important factor in more and more domains of linguistic research, since high-quality linguistic databases can provide a fertile ground for theoretical investigations. Historical corpora represent a rich source of data, but only if the relevant information is specified in a computationally retrievable and interpretable way.

Several databases of historical texts enriched with some kind of linguistic information and metadata have recently been created for various Indo-European languages, such as the Penn Corpora of Historical English, the Tycho Brahe Parsed Corpus of Historical Portuguese, or the Welsh Prose corpus and for non-Indo-European languages as well, cf. the Old Hungarian Corpus.

With the recent increase in the number of annotated historical corpora, it seems advisable to move towards a harmonized common framework and methodology. An important goal of the special issue is to highlight the issues we encounter when annotating languages with rich morphology.

Questions we would like to be addressed include:

- To what extent should the existing annotation schemes be extended for the incorporation of highly inflected languages? - How can existing schemes be extended to accomplish this? - How can the linguistic annotation of historical corpora be standardized to serve an easy-to-use data access for linguists?

We invite submissions of articles describing annotation schemes of historical corpora, attempts to standardization, and harmonized annotation frameworks.

To provide a possibility of collaboration, we organized a special workshop of the 16th Diachronic Generative Syntax conference on "Converging Corpora: How to standardize historical corpora of typologically and genetically different languages". A natural candidate for this call is an extended paper from the workshop presentations. However, we do not limit the contributions to DiGS-related works. Instead, other works presenting standardization efforts of annotation schemes of historical corpora are also welcome.

Finally, papers describing concrete historical corpora or tools adapted to old language varieties are also welcome, provided they highlight important properties of the problem of standardization and present relevant solutions.

IMPORTANT DATES

Submissions due: *14 September 2015* Author notification of acceptance: 30 November 2015 Final manuscripts submitted: 31 March 2016

SUBMISSION OF WORKS

To prepare the papers, please follow the style guidelines provided by the LRE journal

To submit papers: - Go to http://www.editorialmanager.com/lrev/ - Register and login as an author. - Select "S.I. : Converging Corpora" as article type. - Follow the instructions and submit your paper.

GUEST EDITORS

- Tamás Váradi ? Research Institute for Linguistics, Hungarian Academy of Sciences (varadi.tamas at nytud.mta.hu)

- Eszter Simon ? Research Institute for Linguistics, Hungarian Academy of Sciences (simon.eszter at nytud.mta.hu)

-- DR. ESZTER SIMON Research Fellow Research Institute for Linguistics Hungarian Academy of Sciences H-1068 Budapest, Benczúr u. 33. Tel./Fax. +36 1 321 4830/ 129 simon.eszter at nytud.mta.hu -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4215 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150828/c850f891/attachment.txt>

------------------------------

Message: 4 Date: Sun, 30 Aug 2015 02:23:17 +0000 From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk> Subject: Re: [Corpora-List] The global CQPweb family To: "CORPORA at UIB.NO" <corpora at uib.no>, Aynat Rubinstein

<aynat.rubinstein at mail.huji.ac.il>, XuJiajin <xujiajin at outlook.com>

At the distinct risk of boring list members who do not care about these things?.

That yeda.cs.technion.ac.il site is not ?using a web interface developed in-house?; it?s using a lightly modified version of CQPweb v3.0 with a new (and false) copyright attribution in the footer.

In fact, the original copyright mark is still there but commented out in the HTML:

<!--td align="left" class="cqpweb_copynote" width="33%">

CQPweb v3.0.7 &#169; 2008-2012

</td--> [...]

<p>Copyright (c) 2012 HebrewCQPweb.com. All rights reserved. Design by <a href="http://www.freecsstemplates.org">FCT</a>.</p>

Of course, HebrewCQPweb.com does not own any of the code apart from their own modifications.

However, I will presume that the claiming of copyright over the system is an unintentional stumble arising from a misunderstanding of the terms of the GNU General Public Licence under which the code to CWB and CQPweb is distributed, rather than an intentional attempt at theft of intellectual property!

best

Andrew.

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Aynat Rubinstein Sent: 26 August 2015 07:50 To: XuJiajin Cc: CORPORA at UIB.NO Subject: Re: [Corpora-List] The global CQPweb family

To add Hebrew to the picture:

The Knowledge Center for Processing Hebrew (MILA) has made available 11 of its corpora for search with CQP, using a web interface developed in-house. Credit to all developers is given on the website:

http://yeda.cs.technion.ac.il/HebrewCqpWeb/ http://yeda.cs.technion.ac.il/files/HebrewInterfaceToCQP.pdf Best,

Aynat

On Wed, Aug 26, 2015 at 9:15 AM, XuJiajin <xujiajin at outlook.com<mailto:xujiajin at outlook.com>> wrote: Dear Andrew, Thanks for clarifying the distinction between CWB/CQP and CQPweb. Actually, four days ago, I split the bookmarked URLs into two sections on my CQPweb family page (http://www.bfsu-corpus.org/content/cqpweb-family), considering their respective development histories. Thank you too for adding the referral link on the CWB website. Best, Jiajin ________________________________ From: a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk> To: xujiajin at outlook.com<mailto:xujiajin at outlook.com>; corpora at uib.no<mailto:corpora at uib.no> Subject: RE: [Corpora-List] The global CQPweb family Date: Tue, 25 Aug 2015 07:47:26 +0000

To chime in on this slightly late, it?s worth noting that, like some of the later suggestions made in replies, not all of the systems on the original list actually use CQPweb: some use an alternative web-interface built around CWB/CQP. For instance, IntelliText is CWB/CQP based but does not use CQPweb.

This might seem like splitting hairs but classing these as ?CQPweb? misallocates the credit for a lot of hard work on the part of their developers?

But thank you very much for compiling this extremely useful list. I have added a link from the CWB website (see http://cwb.sourceforge.net/demos.php#public ), and I hope later to add a more extensive set of links to public-access CQPweb servers elsewhere on the site.

best

Andrew.

From: corpora-bounces at uib.no<mailto:corpora-bounces at uib.no> [mailto:corpora-bounces at uib.no<mailto:corpora-bounces at uib.no>] On Behalf Of XuJiajin Sent: 20 August 2015 08:01 To: CORPORA at UIB.NO<mailto:CORPORA at UIB.NO> Subject: [Corpora-List] The global CQPweb family

Dear Corpora List members,

I've just collated the information of corpus portals using the CQPweb infrasturcture. http://www.bfsu-corpus.org/content/cqpweb-family

Are you aware of any other CQPweb sites which are not on the list?

Cheers,

Jiajin Xu ___________ Ph.D., Professor National Research Centre for Foreign Language Education Beijing Foreign Studies University Beijing 100089 China http://www.bfsu-corpus.org

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> http://mailman.uib.no/listinfo/corpora

-- Aynat Rubinstein, Ph.D. Mandel Fellow Mandel Scholion -- Interdisciplinary Research Center in the Humanities and Jewish Studies The Hebrew University of Jerusalem -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 14782 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150830/9b4f9452/attachment.txt>

---------------------------------------------------------------------- Send Corpora mailing list submissions to

corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit

http://mailman.uib.no/listinfo/corpora or, via email, send a message with subject or body 'help' to

corpora-request at uib.no

You can reach the person managing the list at

corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific than "Re: Contents of Corpora digest..."

_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora

End of Corpora Digest, Vol 98, Issue 30 *************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 17313 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150901/c23097c6/attachment.txt>



More information about the Corpora mailing list