we have a Serbian corpus in the Sketch Engine so all she needs to do is upload her corpus and then run 'keywords' to compare hers with the reference.
The one that is currently available is not lemmatised so comparisons there would be wordform-baed, however we are lemmatising and POS-tagging a newer, bigger dataset (courtesy of Nikola Ljubešić) as we speak so can make that available too, then she can get key lemmas. If you or she ask, we can make a big sample of the lemmatised material available at a day or two's notice
On 22 February 2013 15:39, Martin Wynne <martin.wynne at it.ox.ac.uk> wrote:
> I would like to pose a question on behalf of a student who would like to
> generate keywords by comparing her corpus of contemporary online personal
> ads in Serbian with a reference corpus.
> Does anyone know of any freely available wordlists for the modern Serbian
> language? Ideally, we'd like a lemma frequency list generated from a
> general reference corpus, although lists from various other text types
> could be useful. We'd be interested if there is a corpus available to use
> as well.
> Many thanks for any help.
> Martin Wynne
> IT Services, University of Oxford
> Oxford e-Research Centre
> Faculty of Linguistics, Philology and Phonetics
> martin.wynne at it.ox.ac.uk
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
> Corpora mailing list
> Corpora at uib.no
-- ======================================== Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com Director Lexical Computing Ltd<http://www.sketchengine.co.uk/>
Visiting Research Fellow University of Leeds<http://leeds.ac.uk>
*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>
*DANTE: a lexical database for English<http://www.webdante.com>
* ======================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3339 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130222/d118cf5c/attachment.txt>