Fwd: [Corpora-List] Application for lemmatising corpora

Grzegorz Chrupała grzegorz at pithekos.net
Fri Mar 23 18:45:01 CET 2007

Something like the following Ruby script would do this (where one line
in the file with lemmas looks like this: "GO went gone going"):


def read_dict(path)
f = File.open(path)
dict = Hash.new
while line = f.gets
words = line.split
lemma = words.shift
words.each do|w| dict[w]=lemma end
return dict

def lemmatize(dict,inp)
while line = inp.gets
puts( line.split.map do|w| dict[w] || w end.join(' ') )


On 3/23/07, Hunter, Duncan <D.I.Hunter at warwick.ac.uk> wrote:




> Hi all,


> Thanks, I have been looking at the applications suggested. Unfortunately,

> what I'm looking for is so simple that it might not be something that many

> people actually use. My texts are untagged, and I'd like to keep them that

> way for the moment. I actually want the lemmas to be inserted right there in

> the text, so you get for example; 'Yesterday I GO to the market.'


> I guess what I'm looking for is a kind of find/replace application that can

> read off a file of (lemmatising) replacements like GO>go, went, gone,

> going...!


> Apologies for not making this clearer!


> Duncan Hunter



More information about the Corpora-archive mailing list