[Corpora-List] Resource inventory for SWE-CLARIN

Eva Forsbom evafo at stp.lingfil.uu.se
Mon Feb 18 14:51:11 CET 2013

Dear colleague,

In 2007, we did a survey of language resources and tools for Swedish (Elenius et al., 2008) as part of a planning grant for a national venture to develop "An Infrastructure for Swedish Language Technology", funded by the Swedish Research Council´s Committee for Research Infrastructures 2007-2008.

This was pre-CLARIN*. We now have a new planning grant for SWE-CLARIN ("Towards a Swedish eScience infrastructure for the humanities and social sciences"), funded by the Swedish Research Council´s Committee for Research Infrastructures 2013. The infrastructure involves Swedish membership and full participation in the CLARIN ERIC. As part of the planning, we would like to update the survey. (If you participated in the previous survey, you only need to add new resources.)

Apart from resources including Swedish, we would now also like to know about resources covering Finnish, Meänkieli, Romani chib, Sami, and Yiddish, which are official minority languages in Sweden, and sign language.

The survey mainly covers the following resources and tools (for spoken or written language):

- Language resources:

- mono- or multilingual corpora

- mono- or multilingual lexicons

- terminology archives

- grammars - Standard resources (benchmarks) for evaluation - Tools for processing language data

- modules (e.g. part-of-speech taggers, parsers, grapheme-to-phoneme


- standards and tools for annotation

- tools for searching and mining information from corpora

For the survey, please fill in a template file for each new resource or tool, and attach it to your reply. The template and definition references are available from http://stp.lingfil.uu.se/~evafo/clarin/. The template is adapted from the META-SHARE metadata model.

We would also like to know, in your own words, whether you are currently developing or planning to develop any new resources or tools, and whether your resources or tools have been used in any humanities and social sciences projects.

We would like your answer as soon as possible, but before February 25, 2013.

Thank you for your cooperation,

Eva Forsbom (evafo at stp.lingfil.uu.se) Department of Linguistics and Philology Uppsala University (on behalf of the planning group)

---- *CLARIN (Common Language Resources and Technology Infrastructure)

Elenius,K., Forsbom, E., and Megyesi, B. 2008. Survey on Swedish Language Resources. Report, February 2008. Dept. of Speech, Music and Hearing, KTH and Dept. of Linguistics and Philology, Uppsala University (http://stp.lingfil.uu.se/blark/swe-blark-survey-2008.pdf)

More information about the Corpora mailing list