[Corpora-List] Interesting Corpus analysis tools & specific corpora

Serge Heiden slh at ens-lyon.fr
Tue Jul 31 15:10:54 CEST 2018


Dear Irina,

You may want to check out the free and open-source TXM platform: http://textometrie.ens-lyon.fr/?lang=en It provides a comprehensive combination of classical Anglo-Saxon corpus linguistics and French data analysis tools for a textual corpora analysis methodology called textometry.

TXM is available: - as an end user desktop application for Windows, Mac & Linux - or as a web server portal application, accessed on line through web browsers by end users  (see an indicative list of on line production grade TXM portals here: https://groupes.renater.fr/wiki/txm-users/public/references_portails)

TXM provides a friendly end user graphical interface for text analysis while using in the background: - CQP search engine and R statistical packages components - XML TEI texts encoding to represent a] written texts from various formats: TXT, Word, XML, TEI..., b] synchronized record transcriptions from XML Transcriber format or c] parallel corpora from XML TMX format - usual word level lexical annotations (pos, lemma...) and text level structural annotations (chapter, section, paragraph, quote...) in standard, and additional annotations by plugins (eg TIGER Search syntactic annotations, Unit-Relation-Schema URS annotations)

See: - the TXM leaflet: http://sourceforge.net/projects/txm/files/documentation/TXM%20Leaftlet%20EN.pdf/download - a first draft of an English version of the TXM user manual: http://textometrie.ens-lyon.fr/files/documentation/TXM%20Manual%200.7.pdf (a bit outdated)

Several annotation services are currently being implemented into TXM by different projects, see their current description in the TXM manual: http://textometrie.ens-lyon.fr/html/doc/manual/0.7.9/fr/manual49.xhtml#toc276 (only in French currently sorry)

Best, Serge

Le 18/07/2018 à 20:55, Irina Temnikova a écrit :
>
> *Hi all!*
>
>
> *I am trying to update a group of (not computational) linguists about the currently _accessible corpora_ and working _corpus analysis tools_.*
>
>
> *I am aware of the most famous tools and multilingual/English corpora.*
>
> **I would be extremely thankful if somebody could point me towards the following:**
>
>
> *1. I am interested in any corpus analysis tools, which are usable by linguists and *
>
> *are** **different** from the usual concordances, keywords/terms extractors, and collocations, i.e. different from:*
>
> *AntConc, WordSmith tools, *SketchEngine (although it is amazingly great! :) ),*LIWC, no NLTK -- too complex for my audience ;).*
>
> *It would be nice if the tools offer some syntactic analysis, for example.*
>
> **It would be better if the tools could be used with the user’s own corpora*, and if they are easy to use.*
>
>
> *2. I am interested in corpora with texts in the following languages (especially learners’ corpora, social media corpora, parallel corpora):*
>
>
> *Italian - especially medieval historical*
>
> *Norwegian*
>
> *Swedish*
>
> *French, specifically social media (e.g. tweets), dialogues between foreigners*
>
> *Spanish tourism*
>
> *Modern Greek*
>
> *Swahili*
>
> *Afrikaans*
>
>
> Thank you very much in advance!
>
> Irina Temnikova
>
> --
> *Irina P. Temnikova, B.A., M.A., Ph.D.*
>
> *Lecturer & Computational Linguistics Researcher*
>
> Sofia University (past Qatar Computing Research Institute & Bulgarian Academy of Sciences)
>
> _https://scholar.google.bg/citations?user=7BcpifAAAAAJ&hl=en_
>
> ------------------------------- --------------------------------       -----
> *Woke up
> *
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora

-- Dr. Serge Heiden, slh at ens-lyon.fr, http://textometrie.ens-lyon.fr ENS de Lyon - IHRIM UMR5317 15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 15642 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180731/c25c55c1/attachment.txt>



More information about the Corpora mailing list