[Corpora-List] fast string replacement

Jörg Schuster joerg.schuster at gmail.com
Mon Mar 14 11:07:00 CET 2005

> Two further questions:


> - What exactly do you mean by "fast"?

I mean really REALLY fast. The size of my rewriting dictionary is 1
million lines at the moment. (But it will grow larger). The size of my
corpus is 80GB. And I would like to be able to tag often.

> - Do you mean string replacement (arbitrary substrings in a line of

> text) or word replacement?

String replacement. I use to make the dictionary such that only true
lexemes are tagged -- be they single words or multi word units.

> Schmid's FST toolkit (see http://www.ims.uni-stuttgart.de/~schmid) and

> Steve Abney's cascaded parser CASS (you'll have to search Google for

> the source code).

I will try this. Thank you.

Jörg Schuster

More information about the Corpora-archive mailing list