[Corpora-List] Part of Speech annotation of Persian and Urdu corpora

Bushra Zawaydeh bzawaydeh at hotmail.com
Wed Feb 27 15:41:25 CET 2008


hi Ben the question was about locating a company that would do the manual annotation for us using a set of tags that we determine, according to guidelines that we write. Are there companies out there that does that? thank you Bushra


> Date: Wed, 27 Feb 2008 11:44:36 +0000
> From: B.Allison at dcs.shef.ac.uk
> To: corpora at uib.no
> Subject: Re: [Corpora-List] Part of Speech annotation of Persian and Urdu corpora
>
> Bushra,
>
> I'm not sure whether you want human-annotated text from which to induce
> a tagger, or are interested in having a working POS tagger itself. If
> the latter, then about a year ago we tracked down a 10 million word
> corpus of Persian which had been hand-annotated, and induced a tagger
> from the 1 million word part that the creators were prepared to give
> away for research purposes. The tagset they used (which they created for
> the job) could be interpreted on two levels -- there was a coarse tagset
> of 14 tags with categories like Noun, Verb, etc. and a much finer one
> which I believe ran to about 150 tags. Accuracies were pretty good --
> over 98% for coarse tags, and around 92% for the fine ones.
>
> I'm not sure if you're prepared for a DIY approach, but I suspect that
> if you are, you could get hold of the corpus we used (I can pass you
> contact information) and use one of many trainable taggers to induce
> your own. Of course, this might not be what you were thinking of...
>
> Ben
>
> hfaili at ece.ut.ac.ir wrote:
> > Dear Bushra,
> > I am working in an Iranian Company (named Douran www.douran.com) which
> > have a good experience and a tools for POS tagging, and other NLP fields
> > in Persian...
> > for more information contact me via hfaili at douran.com
> > regards
> >
> > hello
> > I was wondering if anybody knows of any companies or individual linguists
> > who would do Part of Speech annotation of Persian and Urdu corpora?
> >
> > Thank you
> > Bushra Zawaydeh
> >
> > ********************************************************************
> > Bushra Zawaydeh bushraz at basistech.com
> > Senior Linguist
> > Basis Technology Tel: (617)386-7130
> > One Alewife Center Fax: (617)386-2020
> > Cambridge, MA 02140-2327
> > USA
> > **********************************************************************
> >
> >
> > --------------------------------------------------------------------------------
> > Helping your favorite cause is as easy as instant messaging. You IM, we
> > give. Learn more.
> >
> > __________ NOD32 2853 (20080206) Information __________
> >
> > This message was checked by NOD32 antivirus system.
> > http://www.eset.com
> >
> >
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
> >
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_________________________________________________________________ Shed those extra pounds with MSN and The Biggest Loser! http://biggestloser.msn.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: https://mailman.uib.no/public/corpora/attachments/20080227/b3341fa3/attachment.html



More information about the Corpora mailing list