[Corpora-List] April 2022 Newsletter - LDC

Penn LDC ldc at ldc.upenn.edu
Fri Apr 15 16:26:23 CEST 2022


In this newsletter: LDC Celebrates 30 Years LDC Releases Ukrainian Data for Disaster and Refugee Relief Research

New publication: LORELEI Wolof Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2022T03> ________________________________ LDC Celebrates 30 Years April 2022 marks the beginning of LDC's 30th year as the leader in language resource development and distribution. Founded in 1992, the Consortium has grown from a data repository to a vibrant data center that creates, shares, and preserves language resources for research, education, and technology development. The Catalog continues to grow, housing over 900 titles in more than 90 languages. With the support of members, licensees, sponsors, and collaborators, LDC has distributed over 200,000 copies of data to more than 6,000 organizations worldwide. We are sincerely grateful to the community, and we pledge to continue the mission to provide diverse data, high-quality member services, and research program support.

Stay tuned for upcoming newsletter highlights from the last three decades!

LDC Releases Ukrainian Data for Disaster and Refugee Relief Research LDC is releasing Ukrainian data it developed in the DARPA AIDA program, the NIST Language Recognition Evaluation series and the DARPA LORELEI program under a special no-cost, limited license for disaster and refugee relief research.

These resources are available in three corpora:

LDC2022E06 AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts LDC2020T24 LORELEI Ukrainian Representative Language Pack LDC2020T10 LORELEI Entity Detection and Linking Knowledge Base

For further information about these data sets and licensing terms, see Disaster and Refugee Relief Research.<https://www.ldc.upenn.edu/collaborations/current-projects/disaster-and-refugee-relief-research> ________________________________ New publication: LORELEI Wolof Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2022T03> was developed by LDC and is comprised of approximately 225,000 words of Wolof monolingual text, 115,000 Wolof words translated from English data, 15,000 words annotated for named entities, and 5,000-8,000 words annotated for entity discovery and linking and situation frames.

The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.

Data was collected from news, social network, weblog, discussion forum, and reference material. Entity detection and linking annotation identified entities to be detected by systems for scoring purposes. Situation frame analysis was designed to extract basic information about needs and relevant issues for planning a disaster response effort.

The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.

LORELEI Wolof Representative Language Pack is distributed via web download.

2022 Subscription Members will automatically receive copies of this corpus. 2022 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc at ldc.upenn.edu<mailto:ldc at ldc.upenn.edu> M: 3600 Market St. Suite 810

Philadelphia, PA 19104

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6889 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220415/c50954d6/attachment.txt>



More information about the Corpora mailing list