[Corpora-List] Interesting Corpus analysis tools & specific corpora

Amir Zeldes Amir.Zeldes at georgetown.edu
Fri Jul 20 01:08:33 CEST 2018

Hi Irina,

If you’re interested in tools covering data types beyond POS tagged concordances, and in particular syntactically annotated data and complex user defined annotation types, you may want to check out ANNIS:


We also offer some richly annotated corpora via an ANNIS server at Georgetown University, some of which are in languages you mentioned below, so you can see some of what the system can do here:


We also serve flat annotated corpora, including in languages on your list, using a CQPWeb interface here:


Hope this helps,



Dr. Amir Zeldes

Asst. Prof. of Computational Linguistics

Department of Linguistics

Georgetown University

1437 37th St. NW

Washington, DC 20057


From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Irina Temnikova Sent: Wednesday, July 18, 2018 2:55 PM To: corpora at uib.no Subject: [Corpora-List] Interesting Corpus analysis tools & specific corpora

Hi all!

I am trying to update a group of (not computational) linguists about the currently _accessible corpora_ and working _corpus analysis tools_.

I am aware of the most famous tools and multilingual/English corpora.

*I would be extremely thankful if somebody could point me towards the following:*

1. I am interested in any corpus analysis tools, which are usable by linguists and

are **different** from the usual concordances, keywords/terms extractors, and collocations, i.e. different from:

AntConc, WordSmith tools, SketchEngine (although it is amazingly great! :) ), LIWC, no NLTK -- too complex for my audience ;).

It would be nice if the tools offer some syntactic analysis, for example.

*It would be better if the tools could be used with the user’s own corpora*, and if they are easy to use.

2. I am interested in corpora with texts in the following languages (especially learners’ corpora, social media corpora, parallel corpora):

Italian - especially medieval historical



French, specifically social media (e.g. tweets), dialogues between foreigners

Spanish tourism

Modern Greek



Thank you very much in advance!

Irina Temnikova


Irina P. Temnikova, B.A., M.A., Ph.D.

Lecturer & Computational Linguistics Researcher

Sofia University (past Qatar Computing Research Institute & Bulgarian Academy of Sciences)

https://scholar.google.bg/citations?user=7BcpifAAAAAJ <https://scholar.google.bg/citations?user=7BcpifAAAAAJ&hl=en> &hl=en

------------------------------- -------------------------------- -----

Woke up

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 17556 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180719/c611b24b/attachment.txt>

More information about the Corpora mailing list