Fwd: [Corpora-List] Application for lemmatising corpora

Matthew Purver mpurver at stanford.edu
Fri Mar 23 20:01:01 CET 2007


And if you're looking for a file with those lemmas in, I once produced a
very similar one for English from the Oxford Advanced Learner's
Dictionary - it's available here:

http://www.stanford.edu/~mpurver/software.html

Grzegorz Chrupała wrote:

> Something like the following Ruby script would do this (where one line

> in the file with lemmas looks like this: "GO went gone going"):

>

> #!/usr/bin/ruby

>

> def read_dict(path)

> f = File.open(path)

> dict = Hash.new

> while line = f.gets

> words = line.split

> lemma = words.shift

> words.each do|w| dict[w]=lemma end

> end

> return dict

> end

>

> def lemmatize(dict,inp)

> while line = inp.gets

> puts( line.split.map do|w| dict[w] || w end.join(' ') )

> end

> end

>

> lemmatize(read_dict(ARGV[0]),STDIN)

>

>

> On 3/23/07, Hunter, Duncan <D.I.Hunter at warwick.ac.uk> wrote:

>>

>>

>>

>> Hi all,

>>

>> Thanks, I have been looking at the applications suggested.

>> Unfortunately,

>> what I'm looking for is so simple that it might not be something that

>> many

>> people actually use. My texts are untagged, and I'd like to keep them

>> that

>> way for the moment. I actually want the lemmas to be inserted right

>> there in

>> the text, so you get for example; 'Yesterday I GO to the market.'

>>

>> I guess what I'm looking for is a kind of find/replace application

>> that can

>> read off a file of (lemmatising) replacements like GO>go, went, gone,

>> going...!

>>

>> Apologies for not making this clearer!

>>

>> Duncan Hunter

>>

>

> --

> 'gʒɛgɔʃ


--
Matthew Purver <mpurver at stanford.edu>
Computational Semantics Laboratory, CSLI, Stanford





More information about the Corpora-archive mailing list