[Corpora-List] Application for lemmatising corpora

Hunter, Duncan D.I.Hunter at warwick.ac.uk
Fri Mar 23 17:40:01 CET 2007


Hi all,

Thanks, I have been looking at the applications suggested. Unfortunately, what I'm looking for is so simple that it might not be something that many people actually use. My texts are untagged, and I'd like to keep them that way for the moment. I actually want the lemmas to be inserted right there in the text, so you get for example; 'Yesterday I GO to the market.'

I guess what I'm looking for is a kind of find/replace application that can read off a file of (lemmatising) replacements like GO>go, went, gone, going...!

Apologies for not making this clearer!

Duncan Hunter

________________________________

From: owner-corpora at lists.uib.no on behalf of jasper holmes
Sent: Fri 23/03/2007 09:58
To: corpora at uib.no
Subject: Re: [Corpora-List] Application for lemmatising corpora



You could try WMatrix: http://www.comp.lancs.ac.uk/ucrel/wmatrix/
You need to get a username (one month free trial), and then you do it
online. This does tagging and lemmatising and also some analysis
(frequencies, concordances, key words).

Jasper
http://go.warwick.ac.uk/BAWE


On 3/22/07, Oliver Strunk <strunk at ub.edu> wrote:

>

>

>

> Maybe the TreeTagger from IMS Stuttgart?

>

>

>

> http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html

>

>

>

> It is available for linux and windows; the output includes POS and

> lemmatized text and can easily be converted.

>

>

>

> Oliver Strunk

>

> LADA - University of Barcelona

>

>

>

>

> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On

> Behalf Of Hunter, Duncan

> Sent: Thursday, March 22, 2007 11:45 PM

> To: corpora at uib.no

> Subject: [Corpora-List] Application for lemmatising corpora

>

>

>

>

>

> Hi All,

>

>

>

>

>

> Could anybody suggest a small, downloadable and free application for

> lemmatising texts? For various reasons I need the texts I am examining to be

> in lemmatised form before analysis with corpus tools. It's a small

> collection of texts, a few hundred shortish (article -sized) ones in text

> format.

>

>

>

>

>

> I've had some trouble with the software I'm using at the moment. It tends to

> 'stick' when given a formidable lemma list to process (I'm using Yasumasa

> Someya's fairly lengthy one).

>

>

>

>

>

> All the best,

>

>

>

>

>

> Duncan Hunter




-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.uib.no/public/corpora-archive/attachments/20070323/d300b710/attachment.html


More information about the Corpora-archive mailing list