[Corpora-List] Part of Speech annotation of Persian and Urdu corpora

Ben Allison B.Allison at dcs.shef.ac.uk
Wed Feb 27 15:52:35 CET 2008


Bushra,

I suspect there are. However, my personal experience would be that passing annotation to someone else to annotate according to your guidelines would be dangerous if the annotation scheme you propose is untested -- in my experience, the annotation process should ideally be symbiotic, with refinements coming to categories/schemes as preliminary annotation is performed. Otherwise, you will either impose an annotation scheme which may ultimately be unsuitable, or you may lose control of the eventual scheme. Are there strong reasons for you not to arrange/perform the annotation yourself?

Ben

Bushra Zawaydeh wrote:
> hi Ben
> the question was about locating a company that would do the manual
> annotation for us using a set of tags that we determine, according to
> guidelines that we write. Are there companies out there that does that?
> thank you
> Bushra
>
> > Date: Wed, 27 Feb 2008 11:44:36 +0000
> > From: B.Allison at dcs.shef.ac.uk
> > To: corpora at uib.no
> > Subject: Re: [Corpora-List] Part of Speech annotation of Persian and
> Urdu corpora
> >
> > Bushra,
> >
> > I'm not sure whether you want human-annotated text from which to induce
> > a tagger, or are interested in having a working POS tagger itself. If
> > the latter, then about a year ago we tracked down a 10 million word
> > corpus of Persian which had been hand-annotated, and induced a tagger
> > from the 1 million word part that the creators were prepared to give
> > away for research purposes. The tagset they used (which they created
> for
> > the job) could be interpreted on two levels -- there was a coarse
> tagset
> > of 14 tags with categories like Noun, Verb, etc. and a much finer one
> > which I believe ran to about 150 tags. Accuracies were pretty good --
> > over 98% for coarse tags, and around 92% for the fine ones.
> >
> > I'm not sure if you're prepared for a DIY approach, but I suspect that
> > if you are, you could get hold of the corpus we used (I can pass you
> > contact information) and use one of many trainable taggers to induce
> > your own. Of course, this might not be what you were thinking of...
> >
> > Ben
> >
> > hfaili at ece.ut.ac.ir wrote:
> > > Dear Bushra,
> > > I am working in an Iranian Company (named Douran www.douran.com) which
> > > have a good experience and a tools for POS tagging, and other NLP
> fields
> > > in Persian...
> > > for more information contact me via hfaili at douran.com
> > > regards
> > >
> > > hello
> > > I was wondering if anybody knows of any companies or individual
> linguists
> > > who would do Part of Speech annotation of Persian and Urdu corpora?
> > >
> > > Thank you
> > > Bushra Zawaydeh
> > >
> > > ********************************************************************
> > > Bushra Zawaydeh bushraz at basistech.com
> > > Senior Linguist
> > > Basis Technology Tel: (617)386-7130
> > > One Alewife Center Fax: (617)386-2020
> > > Cambridge, MA 02140-2327
> > > USA
> > > **********************************************************************
> > >
> > >
> > >
> --------------------------------------------------------------------------------
> > > Helping your favorite cause is as easy as instant messaging. You
> IM, we
> > > give. Learn more.
> > >
> > > __________ NOD32 2853 (20080206) Information __________
> > >
> > > This message was checked by NOD32 antivirus system.
> > > http://www.eset.com
> > >
> > >
> > >
> > > _______________________________________________
> > > Corpora mailing list
> > > Corpora at uib.no
> > > http://mailman.uib.no/listinfo/corpora
> > >
> > >
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
>
> ------------------------------------------------------------------------
> Shed those extra pounds with MSN and The Biggest Loser! Learn more.
> <http://biggestloser.msn.com/>



More information about the Corpora mailing list