[Corpora-List] Guidelines for creating Gold Standard Alignment

Nitin Madnani nmadnani at gmail.com
Fri Apr 16 16:35:30 CEST 2010


There has been work on creating gold-standard alignments. See the following:

(1) The annotation style guide for the Blinker project by Dan Melamed. Even though this was written for the purpose of creating English-French alignments using the Blinker tool, some of the guidelines still carry over to the general case. http://repository.upenn.edu/cgi/viewcontent.cgi?article=1054&context=ircs_reports

(2) Annotation guidelines for creating paraphrase alignments by Callison-Burch, Cohn and Lapata. Even though this guide is to help create alignments between sentences in the same language (English), it might still be useful. http://www.dcs.shef.ac.uk/~tcohn/paraphrase_guidelines.pdf

(3) A more comprehensive collection of word alignment guidelines can be found on Rada Mihalcea's web page: http://www.cse.unt.edu/~rada/wa/#guidelinesWA

Cheers, Nitin

On Fri, Apr 16, 2010 at 1:20 AM, mohnish jadwani <mohnishgj at gmail.com> wrote:
> Respected Readers,
> The need to create a Gold Standard Alignment of vital importance when one
> has to evaluate results of bilingual corpus given to word alignment tools
> like Giza++. This Gold Standard Alignment( Test Data ) as many of us know
> serves as a reference against which one can evaluate the results obtained
> using the Training data. For the creation of this test data which is a
> subset of the Training Data, when one goes about it manually,  an individual
> comes across lot of variations with respect source and target languages
> while aligning words for e.g
>
>
> 1# 5 # does(1) he(2) go(3) home(4) ?(5) # 4 2 4 3 0
>
> 1# 5 # क्या(1) वह(2) घर(3) जाता(4) है(5) #
>
> 0 2 4 3 0
>
> the word "does" maps to 'ता' of 'जाता'
>
> There are many such careful considerations one has to keep in mind while
> going about creation of Gold Standard Alignment.
>
> Could you please suggest me any basic guidelines( if not
> English-Hindi language specific ) that one could follow while going about
> this, any reference paper or advice would be of great help.
>
> Thanking You
>
> Mohnish
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>

-- Got Blog? http://greenideas.blogspot.com



More information about the Corpora mailing list