[Corpora-List] Interesting Corpus analysis tools & specific corpora

Janne Bondi Johannessen jannebj at gmail.com
Thu Jul 19 18:38:36 CEST 2018


Dear Irina Temnikova. 1) Most of the corpora on Norwegian at the University of Oslo are available in the search system Glossa. It is especially developed for being easy to use by linguists. You can read about the system here:

- Nøklestad, Anders, Hagen, Kristin, Johannessen, Janne Bondi, Kosek,

Michal and Joel Priestley. 2017. A modernised version of the Glossa

corpus search system <http://urn.nb.no/URN:NBN:no-62240>. In Jörg

Tiedemann (ed.): *Proceedings of the 21st Nordic Conference on

Computational Linguistics (NoDaLiDa)*. 2017, 251-254.

2) For Norwegian corpora (and a few in other languages), please see: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/index.html#written There are written corpora, learner corpora, young school children corpora, parallel corpora and some stunning spoken language corpora (Nordic Dialect Corpus, Nota Oslo Corpus, etc.)

Most of the corpora are available with passwords from Clarin and Edugain.

Best, Janne Bondi Johannessen

2018-07-19 18:16 GMT+02:00 Maria Gavriilidou <maria at ilsp.gr>:


> Dear Irina,
>
> you could try clarin:el (www.clarin.gr/en), the Greek infrastructure for
> language resources and related tools offered as web services (part of the CLARIN
> network <http://www.clarin.eu/>).
>
> In clarin:el the users can select resources (monolingual Greek or
> multilingual parallel corpora containing Greek) from the inventory or
> upload their own resources to process using the tools offered by the
> infrastructure.
>
> best regards,
>
> Maria
>
>
>
>
>
> Maria Gavriilidou
>
> ILSP/R.C. ‘Athena’
>
> Epidavrou & Artemidos 6
>
> GR-15125 Marousi
>
> Athens
>
> Greece
>
> Tel.: +30 210 6875441
>
> Email: maria at ilsp.athena-innovation.gr
>
> URL: www.ilsp.gr
>
>
>
> *From:* corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] *On Behalf
> Of *Adam Ek
> *Sent:* Thursday, July 19, 2018 5:38 PM
> *To:* Irina Temnikova; CORPORA UIB
> *Subject:* Re: [Corpora-List] Interesting Corpus analysis tools &
> specific corpora
>
>
>
> You might find språkbanken (https://spraakbanken.gu.se/korp/) useful. The
> tool contains various Swedish corpora and a sophisticated search interface.
> The site also has an English version which can be accessed in the top right
> corner.
>
>
>
> Adam
>
>
> ------------------------------
>
> *From:* corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of
> Irina Temnikova <irina.temnikova at gmail.com>
> *Sent:* Wednesday, July 18, 2018 8:55:04 PM
> *To:* corpora at uib.no
> *Subject:* [Corpora-List] Interesting Corpus analysis tools & specific
> corpora
>
>
>
> *Hi all!*
>
>
>
> *I am trying to update a group of (not computational) linguists about the
> currently _accessible corpora_ and working _corpus analysis tools_.*
>
>
>
> *I am aware of the most famous tools and multilingual/English corpora.*
>
> **I would be extremely thankful if somebody could point me towards the
> following:**
>
>
>
> *1. I am interested in any corpus analysis tools, which are usable by
> linguists and *
>
> *are **different** from the usual concordances, keywords/terms extractors,
> and collocations, i.e. different from:*
>
> *AntConc, WordSmith tools, SketchEngine (although it is amazingly great!
> :) ), LIWC, no NLTK -- too complex for my audience ;).*
>
> *It would be nice if the tools offer some syntactic analysis, for example.*
>
> **It would be better if the tools could be used with the user’s own
> corpora*, and if they are easy to use.*
>
>
>
> *2. I am interested in corpora with texts in the following languages
> (especially learners’ corpora, social media corpora, parallel corpora):*
>
>
>
> *Italian - especially medieval historical*
>
> *Norwegian*
>
> *Swedish*
>
> *French, specifically social media (e.g. tweets), dialogues between
> foreigners*
>
> *Spanish tourism*
>
> *Modern Greek*
>
> *Swahili*
>
> *Afrikaans*
>
>
>
> Thank you very much in advance!
>
>
>
> Irina Temnikova
>
>
>
> --
>
> *Irina P. Temnikova, B.A., M.A., Ph.D.*
>
> *Lecturer & Computational Linguistics Researcher*
>
> Sofia University (past Qatar Computing Research Institute & Bulgarian
> Academy of Sciences)
>
> *https://scholar.google.bg/citations?user=7BcpifAAAAAJ&hl=en
> <https://scholar.google.bg/citations?user=7BcpifAAAAAJ&hl=en>*
>
> ------------------------------- --------------------------------
> -----
>
> *Woke up*
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
>

--

Janne Bondi Johannessen <http://www.hf.uio.no/multiling/english/people/core-group/jannebj/index.html> Professor, University of Oslo & Editor of Norsk Lingvistisk Tidsskrift

The Text Laboratory, ILN & Center for Multilingualism in Society across the Lifespan

P.O.Box 1102 Blindern, 0317 Oslo, Norway

Tel: +47 22 85 68 14, mob.: +47 928 966 34, e-mail: jannebj at iln.uio.no -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 20792 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180719/b2c0b0d0/attachment.txt>



More information about the Corpora mailing list