We will also present two papers on our word alignment efforts at LREC 2010, which should be available in the proceedings and on LDC's website.
Enriching Word Alignment with Linguistic Tags - Xuansong Li, Niyu Ge, Stephen Grimes, Stephanie Strassel and Kazuaki Maeda
Creating Arabic-English Parallel Word-Aligned Treebank Corpora at LDC - Stephen Grimes, Xuansong Li, Ann Bies, Seth Kulick, Xiaoyi Ma and Stephanie Strassel
Nitin Madnani wrote:
> There has been work on creating gold-standard alignments. See the following:
> (1) The annotation style guide for the Blinker project by Dan Melamed.
> Even though this was written for the purpose of creating
> English-French alignments using the Blinker tool, some of the
> guidelines still carry over to the general case.
> (2) Annotation guidelines for creating paraphrase alignments by
> Callison-Burch, Cohn and Lapata. Even though this guide is to help
> create alignments between sentences in the same language (English), it
> might still be useful.
> (3) A more comprehensive collection of word alignment guidelines can
> be found on Rada Mihalcea's web page:
> On Fri, Apr 16, 2010 at 1:20 AM, mohnish jadwani <mohnishgj at gmail.com> wrote:
>> Respected Readers,
>> The need to create a Gold Standard Alignment of vital importance when one
>> has to evaluate results of bilingual corpus given to word alignment tools
>> like Giza++. This Gold Standard Alignment( Test Data ) as many of us know
>> serves as a reference against which one can evaluate the results obtained
>> using the Training data. For the creation of this test data which is a
>> subset of the Training Data, when one goes about it manually, an individual
>> comes across lot of variations with respect source and target languages
>> while aligning words for e.g
>> 1# 5 # does(1) he(2) go(3) home(4) ?(5) # 4 2 4 3 0
>> 1# 5 # क्या(1) वह(2) घर(3) जाता(4) है(5) #
>> 0 2 4 3 0
>> the word "does" maps to 'ता' of 'जाता'
>> There are many such careful considerations one has to keep in mind while
>> going about creation of Gold Standard Alignment.
>> Could you please suggest me any basic guidelines( if not
>> English-Hindi language specific ) that one could follow while going about
>> this, any reference paper or advice would be of great help.
>> Thanking You
>> Corpora mailing list
>> Corpora at uib.no
-- Stephanie Strassel
Senior Associate Director
Linguistic Data Consortium
3600 Market Street, Suite 810 Philadelphia, PA 19104-2653 USA
strassel at ldc.upenn.edu
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5161 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20100416/d37f3d5d/attachment.txt>