[Corpora-List] POS tagger with custom features

Michele Filannino michele.filannino at cs.manchester.ac.uk
Sat Jul 13 12:03:04 CEST 2013


What do you mean by 'define'?

If it means that it should automatically extract them:

- I don't know (although you could easily, especially for morphological

ones, write a script to extract features and put them in a tab separated

format.) [have a look at the file feature_factory.py<https://github.com/filannim/ManTIME/blob/master/components/feature_factory.py>as an example].

else if it means that it should just consider them for the classification phase:

- I suggest you to use CRF++ (if you don't mind using Conditional Random

Fields).

else:

- sorry, please give me more details. :)

Bye, michele.

On Sat, Jul 13, 2013 at 9:18 AM, amin mansouri <aminmansouri2000 at gmail.com>wrote:


> Dear Colleagues,****
>
> ** **
>
> I need to train a POS tagger (with my specific pos tags and some custom
> features).****
>
> Does anyone know any tool (POS tagger or sequence classifier) which has
> the capability to define some custom features, like orthographic or
> morphological features? LOG-linear is preferred.****
>
> ** **
>
> For example each sequence of training data is as below****
>
> ** **
>
> word1|feature_1|...|feature_N|POS****
>
> word2|feature_1|...|feature_N|POS****
>
> word3|feature_1|...|feature_N|POS ****
>
> ... ****
>
> wordN|feature_1|...|feature_N|POS
>
> --
> Amin.Mansouri
> Natural Language Processing lab,
> School of Electrical and Computer Eng,
> Tehran University
> Tel: +9821-6111-9719
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
> --
> Michele Filannino
>
> CDT PhD student in Computer Science
> Room IT301 - IT Building
> The University of Manchester
> filannim at cs.manchester.ac.uk <http://mailman.uib.no/listinfo/corpora>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4116 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130713/cc874d89/attachment.txt>



More information about the Corpora mailing list