[Corpora-List] POS annotated corpora (update)

Horsmann, Tobias tobias.horsmann at uni-due.de
Thu Jul 28 17:10:29 CEST 2016


Hi everyone, I asked recently for suggestions for publicly available POS annotated corpora. Thanks for the answers. As promised I post my updated list.

I am still looking for more POS annotated corpora so if you are aware of more available corpora then please tell me :)

Norwegian (http://www.nb.no/sprakbanken/show?serial=sbr-10) BrazPortugese Newswire (http://www.nltk.org/nltk_data/) Dutch Alpino (https://www.let.rug.nl/vannoord/trees/) Spanish (https://www.iula.upf.edu/recurs01_tbk_uk.htm) Italian-TurinTree/Parallel (http://www.di.unito.it/~tutreeb/treebanks.html) Polish National Corpus (http://nkjp.pl/index.php?page=14&lang=1) Icelandic-Historical Corpus (http://linguist.is/icelandic_treebank/Icelandic_Parsed_Historical_Corpus_(IcePaHC)) Icelandic (http://www.malfong.is/index.php?lang=en&pg=mim) Slovene-English Parallel Corpus (http://nl.ijs.si/elan/) Finnish Treebank (http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/) German Tiger (http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)

--Newly added------------------------------------------------------------ German Hamburg Treebank (https://corpora.uni-hamburg.de/drupal/en/islandora/object/treebank:hdt) Russian Open Corpus (http://opencorpora.org/?page=downloads) Multi Universial Dependencies (http://universaldependencies.org/) Italian-Pisa (http://www.corpusitaliano.it/en/contents/description.html) English (https://corpling.uis.georgetown.edu/gum/) Coptic (https://github.com/CopticScriptorium/corpora) French (https://deep-sequoia.inria.fr/corpus/) French (https://perso.limsi.fr/pap/free_multitag.tgz) Danish (https://code.google.com/p/copenhagen-dependency-treebank/) Croatian (http://nlp.ffzg.hr/resources/corpora/setimes-hr/) Swedish Talbanken (http://stp.lingfil.uu.se/%7Emojgan/UPDT.html) English Ted Talk Treebank (http://ahclab.naist.jp/resource/tedtreebank/)

Best, Tobias -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6070 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160728/e0edd511/attachment.txt>



More information about the Corpora mailing list