[Corpora-List] March 2022 Newsletter - LDC

Penn LDC ldc at ldc.upenn.edu
Wed Mar 16 17:58:37 CET 2022


In this newsletter: LDC data and commercial technology development

New Publications: AttImam<https://catalog.ldc.upenn.edu/LDC2022T02> HAVIC MED Novel 1 Test - Videos, Metadata and Annotation<https://catalog.ldc.upenn.edu/LDC2022V01> ________________________________ LDC data and commercial technology development For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing<https://www.ldc.upenn.edu/data-management/using/licensing> page for further information. ________________________________ New publications: (1) AttImam<https://catalog.ldc.upenn.edu/LDC2022T02> was developed by Al-Imam Mohammad Ibn Saud Islamic University<https://imamu.edu.sa/en/Pages/default.aspx> and consists of approximately 2,000 attribution relations applied to Arabic newswire text from Arabic Treebank: Part 1 v 4.1 (LDC2010T13)<https://catalog.ldc.upenn.edu/LDC2010T13>. Attribution refers to the process of reporting or assigning an utterance to the correct speaker.

The source Arabic newswire was collected by LDC from Agence France Presse articles published in 2000. Files were annotated by native Arabic speakers and contain the following elements:

* Cue: the lexical anchor that connects the source with the content.

* Source: the entity or the agent that owns the content.

* Content: the basic element expressing the claim or the reported news.

* General Features: these can include such features as attribution style (direct or indirect), determinacy (factual or non-factual), and purpose (e.g., assertion, expression).

AttImam is distributed via web download.

2022 Subscription Members will automatically receive copies of this corpus. 2022 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee. * (2) <https://catalog.ldc.upenn.edu/LDC2022T01> HAVIC MED Novel 1 Test - Videos, Metadata and Annotation<https://catalog.ldc.upenn.edu/LDC2022V01> is comprised of 3,800 hours of user-generated videos with annotation and metadata developed by LDC for the 2015 NIST Multimedia Event Detection tasks. The data consists of videos of various events (event videos) and videos completely unrelated to events (background videos). Each event video was manually annotated with judgments describing its event properties and other salient features. Background videos were labeled with topic and genre categories.

HAVIC MED Novel 1 Test -- Videos, Metadata and Annotation is distributed via web download.

2022 Subscription Members will automatically receive copies of this corpus. 2022 Standard Members may request a copy as part of their 16 free membership corpora. This corpus is a members-only release and is not available for non-member licensing. Contact ldc at ldc.upenn.edu<mailto:ldc at ldc.upenn.edu> for information about membership.

Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc at ldc.upenn.edu<mailto:ldc at ldc.upenn.edu> M: 3600 Market St. Suite 810

Philadelphia, PA 19104

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 10241 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220316/173e6b9c/attachment.txt>



More information about the Corpora mailing list