[Corpora-List] Interesting Corpus analysis tools & specific corpora

Jakub Kozakoszczak jkozakoszczak at gmail.com
Wed Jul 18 21:22:05 CEST 2018


Dear Irina,

I like to familiarize students with Sublime Text <https://www.sublimetext.com/> (code and prose editor) to practice regular expressions as a corpus-related skill. I choose this one because it is light and cross-platform, has a plain non-distracting GUI, works fast on huge files and highlights regular expression matches cleanly in real time. It allows to search in many files in bulk, copy and process all matches and although it doesn't support searching by tags, in my opinion it gives a secure and pleasant start to corpus work to all the less computer-oriented students.

Best, Jakub Kozakoszczak

On 18 July 2018 at 20:55, Irina Temnikova <irina.temnikova at gmail.com> wrote:


> *Hi all!*
>
>
> *I am trying to update a group of (not computational) linguists about the
> currently _accessible corpora_ and working _corpus analysis tools_.*
>
>
> *I am aware of the most famous tools and multilingual/English corpora.*
>
> **I would be extremely thankful if somebody could point me towards the
> following:**
>
>
> *1. I am interested in any corpus analysis tools, which are usable by
> linguists and *
>
> *are** **different** from the usual concordances, keywords/terms
> extractors, and collocations, i.e. different from:*
>
> *AntConc, WordSmith tools, SketchEngine (although it is amazingly great!
> :) ), LIWC, no NLTK -- too complex for my audience ;).*
>
> *It would be nice if the tools offer some syntactic analysis, for example.*
>
> **It would be better if the tools could be used with the user’s own
> corpora*, and if they are easy to use.*
>
>
> *2. I am interested in corpora with texts in the following languages
> (especially learners’ corpora, social media corpora, parallel corpora):*
>
>
> *Italian - especially medieval historical*
>
> *Norwegian*
>
> *Swedish*
>
> *French, specifically social media (e.g. tweets), dialogues between
> foreigners*
>
> *Spanish tourism*
>
> *Modern Greek*
>
> *Swahili*
>
> *Afrikaans*
>
>
> Thank you very much in advance!
>
> Irina Temnikova
>
> --
> *Irina P. Temnikova, B.A., M.A., Ph.D.*
>
> *Lecturer & Computational Linguistics Researcher*
>
> Sofia University (past Qatar Computing Research Institute & Bulgarian
> Academy of Sciences)
>
> *https://scholar.google.bg/citations?user=7BcpifAAAAAJ&hl=en
> <https://scholar.google.bg/citations?user=7BcpifAAAAAJ&hl=en>*
> ------------------------------- --------------------------------
> -----
>
> *Woke up*
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 10827 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180718/ec73e317/attachment.txt>



More information about the Corpora mailing list