[Corpora-List] (no subject)

Eirini LS eirini_ls
Thu Jan 17 15:57:11 CET 2013

I mean that I have two different scripts for the same word (e.g. two scripts for "cat") written by different people. The first script generates 358 words (and only 107 words are correct), and the second script generates 497 words (and 471 words are correct). Can I say that the result of the first script is worse or not? Once again sorry for bothering.   Irina L


From: Mike Maxwell <maxwell at umiacs.umd.edu> To: Eirini LS <eirini_ls at yahoo.com> Cc: "corpora at uib.no" <corpora at uib.no> Sent: Thursday, January 17, 2013 5:11 PM Subject: Re: [Corpora-List] (no subject)

On 1/17/2013 3:09 AM, Eirini LS wrote:
> Thank you very much for your answer. But if I have two scripts for a word, and the first script
> generates 358 units (107 units - correct) and the second script - 497 units (471 units - correct)
> after my hand-validation of the list,  which I get using "print lower-words" (this command helps
> me to provide output in .txt file, because of utf8 code, which isn't visible in xfst), does it
> mean that the first script is not a correct one? Which of this two scripts is better? Thank you
> in advance, *Irina L*

Sorry, I don't understand the question; I'm not sure what it means to have two scripts for a word, nor what the units are.

As for UTF8, whether it appears in xfst depends on the settings in whatever command-line processor you're using (Linux bash, Windows' cmd, etc.).  That said, for testing purposes (as opposed to, say, debugging a new rule), you generally want to send your output to a file, so you can compare it with previous results. --     Mike Maxwell     maxwell at umiacs.umd.edu     "My definition of an interesting universe is     one that has the capacity to study itself."         --Stephen Eastmond -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3094 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130117/588e557c/attachment.txt>

More information about the Corpora mailing list