[Corpora-List] fast string replacement

Piao, Songlin s.piao at lancaster.ac.uk
Fri Mar 11 18:22:00 CET 2005

Hi Jörg,

I put a freely downloadable Java tool on my webpage, which has a function for the same purpose, :
http://www.lancs.ac.uk/staff/piaosl/research/download/download.htm <http://www.lancs.ac.uk/staff/piaosl/research/download/download.htm>

You can use it for your purpose as follows:

1) Replace commas with tabs in the rules (the program use tabs as separator),
2) List your rules, with each rule in a separate line as shown below:
books books/v:3:pres;n:plur
nice nice/adj

3) go to menu "Tools" --> "Convert Codes", and click on it to get a file chooser.
4) Choose one or multiple files that you want to convert.

Then the program will convert all the matching items with corresponding substitutes in the files.

For it is Java program, it should be running in Linux.

I tried with your sample rules and senetnce with it, and I got exactly the same result as you hoped.

Scott Piao


From: owner-corpora at lists.uib.no on behalf of js at cis.uni-muenchen.de
Sent: Fri 11/03/2005 14:43
To: CORPORA at hd.uib.no
Subject: [Corpora-List] fast string replacement


I am looking for a program that

- takes as input a string (!) rewriting dictionary and and a corpus
- applies all rewriting rules to all lines of the corpus
- is fast, stable and free
- works under Linux


Some rewriting rules:

book3, books/v:3:pres;n:plur
nice, nice/adj

A "corpus" before transduction:

John reads nice books.

The same corpus after transduction:

John reads nice/adj books/v:3:pres;n:plur

Does anyone know such a program?

Jörg Schuster

More information about the Corpora-archive mailing list