[Corpora-List] (no subject)

Mike Maxwell maxwell
Thu Jan 17 15:11:32 CET 2013

On 1/17/2013 3:09 AM, Eirini LS wrote:
> Thank you very much for your answer. But if I have two scripts for a word, and the first script
> generates 358 units (107 units - correct) and the second script - 497 units (471 units - correct)
> after my hand-validation of the list, which I get using "print lower-words" (this command helps
> me to provide output in .txt file, because of utf8 code, which isn't visible in xfst), does it
> mean that the first script is not a correct one? Which of this two scripts is better? Thank you
> in advance, *Irina L*

Sorry, I don't understand the question; I'm not sure what it means to have two scripts for a word, nor what the units are.

As for UTF8, whether it appears in xfst depends on the settings in whatever command-line processor you're using (Linux bash, Windows' cmd, etc.). That said, for testing purposes (as opposed to, say, debugging a new rule), you generally want to send your output to a file, so you can compare it with previous results. --

Mike Maxwell

maxwell at umiacs.umd.edu

"My definition of an interesting universe is

one that has the capacity to study itself."

--Stephen Eastmond

More information about the Corpora mailing list