[Corpora-List] HeidelTime 1.8: cross-domain temporal tagger for 11 languages

Jannik Strötgen jannik.stroetgen at gmail.com
Tue Dec 16 10:32:51 CET 2014


Dear list members,

We are happy to announce the release of version 1.8 of our multilingual, cross-domain, and easy-to-extend temporal tagger HeidelTime. [1]

In the context of the new version, Croatian resources were added - developed and kindly provided by Luka Shukan et al. (University of Zagreb). [2] Furthermore, the Italian resources were significantly improved in the context of the EVALITA-2014 EVENTI task. [3] Finally, we have made some processing speed and stability improvements affecting the UIMA kit and standalone versions.

In the meanwhile, 11 languages are supported (ordered alphabetically): Arabic, Chinese, Croatian, Dutch, English, French, German, Italian, Russian, Spanish, and Vietnamese.

In addition, HeidelTime distinguishes between news-style documents and narrative-style documents (e.g., Wikipedia articles) in all languages. In addition, English colloquial (e.g., Tweets and SMS) and scientific articles (e.g., clinical trails) are supported.

HeidelTime is available at Google Code [1] as a UIMA component and as a Java standalone version. If you want to briefly test it, there is also an online demo. [4]

In addition to HeidelTime itself, the UIMA HeidelTime kit contains several collection readers and CAS consumers (mainly for processing temporally annotated corpora) as well as analysis engines wrapping several part-of-speech taggers to perform linguistic preprocessing in all supported languages.

Any kind of feedback is highly appreciated!

Best regards, The HeidelTime Team http://code.google.com/p/heideltime/ https://twitter.com/HeidelTime

[1] <http://code.google.com/p/heideltime/>http://code.google.com/p/heideltime/wiki/Downloads

[2] Luka Skukan, Goran Glavas(, and Jan S(najder (2014): HeidelTime.Hr: Extracting and Normalizing Temporal Expressions in Croatian. In Proceedings of the 9th Language Technologies Conference, pages 99-103. ( http://nl.ijs.si/isjt14/proceedings/isjt2014_17.pdf)

[3] Giulio Manfredi, Jannik Strötgen, Julian Zell, and Michael Gertz (2014): HeidelTime at EVENTI: Tuning Italian Resources and Addressing TimeML's Empty Tags. In Proceedings of the 4th International Workshop EVALITA-2014, pages 39-43. ( http://dbs.ifi.uni-heidelberg.de/fileadmin/Team/jannik/publications/2014_EVALITA_ManfrediEtAl.pdf)

[4] http://heideltime.ifi.uni-heidelberg.de/heideltime/

-- Jannik Strötgen Institute of Computer Science Database Systems Research Group Im Neuenheimer Feld 348 69120 Heidelberg Germany Phone: +49 (0) 6221 / 54-5709 eMail: stroetgen at informatik.uni-heidelberg.de www: http://dbs.ifi.uni-heidelberg.de/

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3781 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20141216/2fb0926e/attachment.txt>



More information about the Corpora mailing list