Fwd: [Corpora-List] Application for lemmatising corpora

Grzegorz Chrupała grzegorz at pithekos.net
Fri Mar 23 18:45:01 CET 2007


Something like the following Ruby script would do this (where one line
in the file with lemmas looks like this: "GO went gone going"):

#!/usr/bin/ruby

def read_dict(path)
f = File.open(path)
dict = Hash.new
while line = f.gets
words = line.split
lemma = words.shift
words.each do|w| dict[w]=lemma end
end
return dict
end

def lemmatize(dict,inp)
while line = inp.gets
puts( line.split.map do|w| dict[w] || w end.join(' ') )
end
end

lemmatize(read_dict(ARGV[0]),STDIN)


On 3/23/07, Hunter, Duncan <D.I.Hunter at warwick.ac.uk> wrote:

>

>

>

> Hi all,

>

> Thanks, I have been looking at the applications suggested. Unfortunately,

> what I'm looking for is so simple that it might not be something that many

> people actually use. My texts are untagged, and I'd like to keep them that

> way for the moment. I actually want the lemmas to be inserted right there in

> the text, so you get for example; 'Yesterday I GO to the market.'

>

> I guess what I'm looking for is a kind of find/replace application that can

> read off a file of (lemmatising) replacements like GO>go, went, gone,

> going...!

>

> Apologies for not making this clearer!

>

> Duncan Hunter

>


--
'gʒɛgɔʃ


More information about the Corpora-archive mailing list