[Corpora-List] POS annotated corpora (update)

Alberto Simões ambs at di.uminho.pt
Thu Jul 28 17:14:34 CEST 2016


A lot of corpora for Portuguese (EU) in http://linguateca.pt/ACDC, most of which can be downloaded freely

Best,

On 28/07/16 16:10, Horsmann, Tobias wrote:
> Hi everyone,
>
> I asked recently for suggestions for publicly available POS annotated
> corpora.
>
> Thanks for the answers. As promised I post my updated list.
>
>
>
> I am still looking for more POS annotated corpora so if you are aware of
> more available corpora then please tell me :)
>
>
>
> Norwegian (http://www.nb.no/sprakbanken/show?serial=sbr-10)
>
> BrazPortugese Newswire (http://www.nltk.org/nltk_data/)
>
> Dutch Alpino (https://www.let.rug.nl/vannoord/trees/)
>
> Spanish (https://www.iula.upf.edu/recurs01_tbk_uk.htm)
>
> Italian-TurinTree/Parallel (http://www.di.unito.it/~tutreeb/treebanks.html)
>
> Polish National Corpus (http://nkjp.pl/index.php?page=14&lang=1)
>
> Icelandic-Historical Corpus
> (http://linguist.is/icelandic_treebank/Icelandic_Parsed_Historical_Corpus_(IcePaHC))
>
> Icelandic (http://www.malfong.is/index.php?lang=en&pg=mim)
>
> Slovene-English Parallel Corpus (http://nl.ijs.si/elan/)
>
> Finnish Treebank
> (http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/)
>
> German Tiger
> (http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)
>
>
>
> --Newly added------------------------------------------------------------
>
> German Hamburg Treebank
> (https://corpora.uni-hamburg.de/drupal/en/islandora/object/treebank:hdt)
>
> Russian Open Corpus (http://opencorpora.org/?page=downloads)
>
> Multi Universial Dependencies (http://universaldependencies.org/)
>
> Italian-Pisa (http://www.corpusitaliano.it/en/contents/description.html)
>
> English (https://corpling.uis.georgetown.edu/gum/)
>
> Coptic (https://github.com/CopticScriptorium/corpora)
>
> French (https://deep-sequoia.inria.fr/corpus/)
>
> French (https://perso.limsi.fr/pap/free_multitag.tgz)
>
> Danish (https://code.google.com/p/copenhagen-dependency-treebank/)
>
> Croatian (http://nlp.ffzg.hr/resources/corpora/setimes-hr/)
>
> Swedish Talbanken (http://stp.lingfil.uu.se/%7Emojgan/UPDT.html)
>
> English Ted Talk Treebank (http://ahclab.naist.jp/resource/tedtreebank/)
>
>
>
>
>
> Best,
>
> Tobias
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



More information about the Corpora mailing list