[Corpora-List] POS annotated corpora

Richard Eckart de Castilho eckart at ukp.informatik.tu-darmstadt.de
Thu Jul 21 22:23:26 CEST 2016

On 20.07.2016, at 14:24, Horsmann, Tobias <tobias.horsmann at uni-due.de> wrote:
> German Tiger (http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)

Tiger can be downloaded via a deep link, but the website seems to be build in such a way that it is expected that users first click through the license for academic use.

The Universal Dependency Treebank contains POS tags

- http://universaldependencies.org

The DKPro Core documentation [1] lists references to various corpora. I most (if not all) of the corpora listed under the CoNLL 2006 section fit your criteria, e.g.:

- https://code.google.com/p/copenhagen-dependency-treebank/ - http://nlp.ffzg.hr/resources/corpora/setimes-hr/ - http://stp.lingfil.uu.se/%7Emojgan/UPDT.html - https://gforge.inria.fr/projects/sequoiabank/ - ...

Also check the TigerXML section.


-- Richard

[1] https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-Conll2006

-- ------------------------------------------------------------------- Dr. Richard Eckart de Castilho Technical Lead Ubiquitous Knowledge Processing (UKP) Lab FB 20 / Computer Science Department Technische Universitšt Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-25299, fax -25295, room S2/02/B117 eckart at ukp.informatik.tu-darmstadt.de www.ukp.tu-darmstadt.de

Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources (AIPHES): www.aiphes.tu-darmstadt.lde PhD program: Knowledge Discovery in Scientific Literature (KDSL) www.kdsl.tu-darmstadt.de -------------------------------------------------------------------

More information about the Corpora mailing list