[Corpora-List] fast string replacement

Rob Malouf rmalouf at mail.sdsu.edu
Fri Mar 11 18:49:00 CET 2005


On Fri, 2005-03-11 at 07:28, Stefan Evert wrote:

> If you're really interested in string replacement (probably with some

> additional code to identify word boundaries), you should be looking at

> finite-state transducers. Two open-source solutions I know are Helmut

> Schmid's FST toolkit (see http://www.ims.uni-stuttgart.de/~schmid) and

> Steve Abney's cascaded parser CASS (you'll have to search Google for

> the source code).


You should also consider Gertjan van Noord's FSA Utilities:

http://grid.let.rug.nl/~vannoord/Fsa/fsa.html

It can compile your transducers into Java or C code for portable and/or
efficient execution.

--
Rob Malouf <rmalouf at mail.sdsu.edu>
Department of Linguistics and Oriental Languages
San Diego State University







More information about the Corpora-archive mailing list