[Corpora-List] fast string replacement

Jörg Schuster joerg.schuster at gmail.com
Mon Mar 14 11:07:00 CET 2005



> Two further questions:

>

> - What exactly do you mean by "fast"?


I mean really REALLY fast. The size of my rewriting dictionary is 1
million lines at the moment. (But it will grow larger). The size of my
corpus is 80GB. And I would like to be able to tag often.


> - Do you mean string replacement (arbitrary substrings in a line of

> text) or word replacement?


String replacement. I use to make the dictionary such that only true
lexemes are tagged -- be they single words or multi word units.


> Schmid's FST toolkit (see http://www.ims.uni-stuttgart.de/~schmid) and

> Steve Abney's cascaded parser CASS (you'll have to search Google for

> the source code).


I will try this. Thank you.

Jörg Schuster







More information about the Corpora-archive mailing list