[Corpora-List] Open source HMM POS tagger

Yannick Versley versley
Mon Jan 7 22:17:05 CET 2013


LingPipe is shared-source, i.e., you can use it freely (as long as you don't sell the output) and you can look at the source, but you cannot create derivative works.

hunpos is a free (as in open source) HMM-based tagger http://code.google.com/p/hunpos/

If you want/need to do something more exotic than hunpos can do (and don't want to dig into hunpos' OCaml source code), the attached file implements a bare-bones but useful HMM tagger (with Kneser-Ney smoothing for the n-gram model and suffix backoff for the word model) in under 300 lines of Python code - this should be easy to adapt to special needs if you want flexibility rather than execution speed.

Best wishes, Yannick Versley

On Mon, Jan 7, 2013 at 9:46 PM, Lushan Han <lushan1 at umbc.edu> wrote:


> I know lingPipe provides a HMM pos tagger which is open source.
>
> Best,
>
> Lushan
>
> On Mon, Jan 7, 2013 at 5:37 AM, Fatemeh Torabi Asr <torabiasr at gmail.com>wrote:
>
>> Dear all,
>>
>> I'm looking for an open source efficient HMM POS tagger to run it for
>> something like an artificial language. I would like it to be configurable
>> for different sizes of N-grams, taking the list of possible tags and a
>> dictionary (small tagged corpus) and then could be trained on a large
>> corpus of un-annotated text.
>> I also wonder if any of the existing *HMM-based* POS taggers consider
>> word features (not only the word content but instead a feature vector of
>> the observable properties of the word in the un-labled text, e.g., some
>> semantic features attached to the word frame). So, it would be great if an
>> state-of-the-art HMM tagger implementation is already available considering
>> such a representation of the states.
>>
>> Best,
>> Fatemeh
>>
>>
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>

-- Dr. Yannick Versley

Sonderforschungsbereich 833 Universität Tübingen Nauklerstr. 35 72074 Tübingen

Tel.: +49-7071-29 77155 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3771 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130107/2ae0bb79/attachment.txt> -------------- next part -------------- A non-text attachment was scrubbed... Name: model_tools.py Type: application/octet-stream Size: 8607 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130107/2ae0bb79/attachment.obj>



More information about the Corpora mailing list