[Corpora-List] Corpora Digest, Vol 62, Issue 26 - open source tools for German language

Anne Schumann anne.schumann at Tilde.lv
Wed Aug 29 12:15:07 CEST 2012


Sree,

For morphological analysis, take a look at RFTagger (http://www.ims.uni-stuttgart.de/projekte/corplex/RFTagger/). I also know of two other parsers: BitPar (http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/BitPar.html) and ParZu (https://github.com/rsennrich/ParZu).

Best, Anne

Anne-Kathrin Schumann Phd student University of Vienna Tilde

-----Original Message----- From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of corpora-request at uib.no Sent: Wednesday, August 29, 2012 1:00 PM To: corpora at uib.no Subject: Corpora Digest, Vol 62, Issue 26

Today's Topics:

1. 2 research associate positions (final reminder - deadline

2nd Sept) (Orasan, Constantin)

2. open source tools for German Langauge (sree ganesh)

3. Re: open source tools for German Langauge (Ivelina Nikolova)

4. Re: open source tools for German Langauge (Torsten Zesch)

5. Re: open source tools for German Langauge (manaal faruqui)

6. NLP Research Associate post at UCREL, Lancaster University

(Rayson, Paul)

7. Re: open source tools for German Langauge (Thomas Proisl)

8. Corpus Linguistics in the South 4: Hands-on workshop

(Charlotte Taylor)

----------------------------------------------------------------------

Message: 1 Date: Tue, 28 Aug 2012 14:06:45 +0000 From: "Orasan, Constantin" <C.Orasan at wlv.ac.uk> Subject: [Corpora-List] 2 research associate positions (final reminder

- deadline 2nd Sept) To: "corpora at lists.uib.no" <corpora at lists.uib.no>,

"cluk at dcs.shef.ac.uk" <cluk at dcs.shef.ac.uk>

[Apologies for multiple postings]

The Research Group in Computational Linguistics (http://clg.wlv.ac.uk) at the University of Wolverhampton invites applications for two research associate posts in the DVC project (http://clg.wlv.ac.uk/projects/DVC/)

Salary: £28,401 ? £31,020 pa (level of appointment dependent on qualifications and experience) Duration: These are temporary, fixed-term appointments for maximum three years (dependent on start date of contract). Application deadline: 2nd September 2012

RA1: Research Associate in Computational Linguistics (REF: A5910)

To work as part of a team to research computational linguistics approaches to investigate the relationship between the meaning and the use of English verbs. This is a project funded by AHRC and the successful applicant may be required to attend meetings in the Czech Republic. Applicants should have a PhD in Information Science, Computer Science or Natural Language Processing (or equivalent experience) and proven research experience in these fields. Applicants must be familiar with corpus linguistics and should have experience of at least some of the following fields: textual entailment, semantic role labelling and word sense disambiguation. Knowledge of a programming language is also essential. Experience of web application development, corpus annotation and using NLP tools is desirable. Applicants should feel comfortable with understanding linguistic theories such as Sinclair?s and/or Hanks?s principles of corpus pattern analysis.

RA2: Research Associate in Lexicography (REF: A5911)

To work as part of a team to research computational linguistics approaches to investigate the relationship between the meaning and the use of English verbs. This is a project funded by AHRC and the successful applicant may be required to attend meetings in the Czech Republic. Applicants should have a PhD in Corpus Linguistics and/or equivalent practical experience in Lexical Analysis for publication. They must have experience of contextual analysis of meaning, collocational preferences, and lexical semantics, along with knowledge of corpus linguistics and dictionary building. Experience of working with ontologies and/or semantic types is desirable, as is knowledge of Sinclair?s and/or Hanks?s principles of corpus pattern analysis, familiarity with corpus annotation and the use of annotation tools, and exposure to computational linguistics.

For informal enquiries please contact Alison Carminke, alison.carminke at wlv.ac.uk quoting the reference number.

For detailed further particulars and an application form visit our website: http://www.wlv.ac.uk

Alternatively please contact the Personnel Services Department, University of Wolverhampton, Molineux Street, Wolverhampton WV1 1SB. Tel: 01902 321049 (ansaphone). For hearing impaired candidates our minicom number is 01902 321249. Email address: per at wlv.ac.uk Visit our website at http://www.wlv.ac.uk/

The University is eager to attract larger numbers of applications from groups of people currently under-represented in the staff population, especially from women and people from ethnic minority groups.

Established by Prof Mitkov in 1998, the Research Group in Computational Linguistics delivers cutting-edge research in a number of NLP areas such as anaphora resolution, automatic summarisation, question answering, multilingual text processing, multiple-choice question generation and text simplification. The results from the latest Research Assessment Exercise announced on 17 December 2008 confirm the Research Group in Computational Linguistics as one of the top performers in UK research. The research group was ranked joint 3rd with 2 more universities in the Unit of Assessment ?Linguistics?. According to the league tables of the Guardian, The Times and Research Fortnight, research in Linguistics at the University of Wolverhampton in one of the top 6 in the UK.

-- Dr. Constantin Orasan <C.Orasan at wlv.ac.uk> Senior Lecturer in Computational Linguistics Deputy Head of the Research Group in Computational Linguistics Research Group in Computational Linguistics http://www.wlv.ac.uk/~in6093/ University of Wolverhampton

------------------------------

Message: 2 Date: Tue, 28 Aug 2012 17:05:55 +0200 From: sree ganesh <sganeshhcu at gmail.com> Subject: [Corpora-List] open source tools for German Langauge To: corpora at uib.no

Deare Members, I would like to get some suggestions from you on

1. Are there any open source Morphological analysers and parsers for German language? 2. I would like to extract Noun phrases for German corpus. Any suggestions on this?

Regards Sri -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 276 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120828/06e2c75d/attachment.txt>

------------------------------

Message: 3 Date: Tue, 28 Aug 2012 18:28:47 +0300 From: Ivelina Nikolova <iva at lml.bas.bg> Subject: Re: [Corpora-List] open source tools for German Langauge To: corpora at uib.no

Hi Sri,

I had the same problem and used this chunker: http://www.semanticsoftware.info/munpex <http://www.semanticsoftware.info/munpex#Installation> You may find it useful too.

Best, Ivelina

On 08/28/2012 06:05 PM, sree ganesh wrote:
> Deare Members,
> I would like to get some suggestions from you on
>
> 1. Are there any open source Morphological analysers and parsers for
> German language?
> 2. I would like to extract Noun phrases for German corpus. Any
> suggestions on this?
>
> Regards
> Sri
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1581 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120828/5fce6e29/attachment.txt>

------------------------------

Message: 4 Date: Tue, 28 Aug 2012 18:35:38 +0000 From: Torsten Zesch <zesch at ukp.informatik.tu-darmstadt.de> Subject: Re: [Corpora-List] open source tools for German Langauge To: "'corpora at uib.no' (corpora at uib.no)" <corpora at uib.no>

Dear Sri,

1. StanfordParser (http://nlp.stanford.edu/software/lex-parser.shtml) and mate-tools (http://code.google.com/p/mate-tools/) come with pre-packaged models for German.

2. Try TreeTaggerChunker (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/)

If you are working with Java, the DKPro Core framework (http://code.google.com/p/dkpro-core-asl/) comes with easy to use wrappers for TreeTagger and StanfordParser. An integration of the mate-tools is in preparation.

-Torsten

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of sree ganesh Sent: Tuesday, August 28, 2012 5:06 PM To: corpora at uib.no Subject: [Corpora-List] open source tools for German Langauge

Deare Members, I would like to get some suggestions from you on

1. Are there any open source Morphological analysers and parsers for German language? 2. I would like to extract Noun phrases for German corpus. Any suggestions on this?

Regards Sri -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5389 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120828/b71822b4/attachment.txt>

------------------------------

Message: 5 Date: Tue, 28 Aug 2012 20:17:10 -0400 From: manaal faruqui <manaalfar at gmail.com> Subject: Re: [Corpora-List] open source tools for German Langauge To: sree ganesh <sganeshhcu at gmail.com> Cc: corpora at uib.no

Hi Sree,

You can also find the German-NER here, in case you want to further divide the Noun-phrases into categories: http://www.nlpado.de/~sebastian/software/ner_german.shtml

Best, Manaal

On Tue, Aug 28, 2012 at 11:05 AM, sree ganesh <sganeshhcu at gmail.com> wrote:


> Deare Members,
> I would like to get some suggestions from you on
>
> 1. Are there any open source Morphological analysers and parsers for
> German language?
> 2. I would like to extract Noun phrases for German corpus. Any suggestions
> on this?
>
> Regards
> Sri
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1273 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120828/889fd29a/attachment.txt>

------------------------------

Message: 6 Date: Wed, 29 Aug 2012 07:26:41 +0000 From: "Rayson, Paul" <p.rayson at lancaster.ac.uk> Subject: [Corpora-List] NLP Research Associate post at UCREL,

Lancaster University To: "corpora at uib.no" <corpora at uib.no>

Research Associate in Natural Language Processing of Corporate Financial Communications

School of Computing and Communications Salary: £25,251 to £29,249 Closing Date: Friday 21 September 2012 Interview Date: To be confirmed Reference: A489

Applications are invited for a Research Associate position in natural language processing as part of an interdisciplinary team working on Corporate Financial Communications within the Department of Accounting and Finance and the School of Computing and Communications (SCC) at Lancaster University.

You should have a first degree or Master's degree in Computer Science, Computational Linguistics, Text Mining, or a related field; and a PhD in the area of corpus-based analysis, natural language processing or a closely related subject. You should also possess suitable software development skills and demonstrate the ability to work as part of a team as well as the capability to integrate diverse multi-disciplinary requirements into the design of the natural language processing tools to be developed.

Candidates are encouraged to make informal enquires to the project investigators Dr Paul Rayson (p.rayson at lancaster.ac.uk) in SCC or Prof Steven Young (s.young at lancaster.ac.uk) in Accounting and Finance.

For more details, see http://hr-jobs.lancs.ac.uk/Vacancy.aspx?ref=A489

Dr. Paul Rayson Director of UCREL and Senior Lecturer in Computer Science Faculty of Science and Technology Director of International Teaching Partnerships School of Computing and Communications, Infolab21, Lancaster University, Lancaster, LA1 4WA, UK. Web: http://www.comp.lancs.ac.uk/~paul/ Tel: +44 1524 510357 Fax: +44 1524 510492

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4690 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120829/adc54706/attachment.txt>

------------------------------

Message: 7 Date: Wed, 29 Aug 2012 09:42:17 +0200 From: Thomas Proisl <tsproisl at linguistik.uni-erlangen.de> Subject: Re: [Corpora-List] open source tools for German Langauge To: sree ganesh <sganeshhcu at gmail.com> Cc: corpora at uib.no

Dear Sri,


> 1. Are there any open source Morphological analysers and parsers for
> German language?

Morphisto (https://code.google.com/p/morphisto/) might fit your needs. Here is a quote from its website:


> Morphisto is a morphological analyzer and generator for German
> wordforms. The basis of Morphisto is the open-source SMOR morphology
> for the German language developed by the University of Stuttgart (GPL
> v2) for which a free lexicon is provided under the Creative Commons
> 3.0 BY-SA Non-Commercial license.

Best regards, Thomas

-- Department Germanistik und Komparatistik Professur für Computerlinguistik Bismarckstr. 6, 91054 Erlangen

Institut für Anglistik und Amerikanistik Lehrstuhl für Anglistik, insbesondere Linguistik Bismarckstr. 1, 91054 Erlangen

Fon: +49 9131 85-25908; Fax: +49 9131 85-29251 http://www.linguistik.uni-erlangen.de/~tsproisl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120829/9faef760/attachment-0001.asc>

------------------------------

Message: 8 Date: Wed, 29 Aug 2012 10:20:30 +0100 From: "Charlotte Taylor" <Charlotte.Taylor at port.ac.uk> Subject: [Corpora-List] Corpus Linguistics in the South 4: Hands-on

workshop To: <corpora at uib.no>

We are pleased to announce that the next Corpus Linguistics in the South will be hosted by the University of Portsmouth on Saturday 10 November. It will be a practical hands-on workshop with software which may be useful to corpus linguists. The programme and description of the sessions are copied below.

As always, attendance is free but places are limited and will be assigned on a first come first served basis. If you would like to attend, please email charlotte.taylor at port.ac.uk. Could you also specify if you would like to join us for lunch at a local cafe/restaurant (max £10).

Programme 9.15 Welcome coffee 9.30 Sketch Engine: Advanced workshop

Adam Kilgarriff, Lexcom Computing, Brighton 11.00 EXMARaLDA (Extensible Markup Language for Discourse Annotation) Daniel Jettka, Hamburg Centre for Spoken Corpora, Germany 13.00 Lunch 14.15 CHILDES (Child Language Data Exchange System) Kevin McManus, University of Southampton 15.45 Unix for Corpus Users

John Williams, University of Portsmouth 17.15 Arrangement of next two Corpus Linguistics in the South events & Close

Sketch Engine: Advanced Workshop This will be an opportunity for people with some experience of Sketch Engine to see and try out some more advanced features, and also to ask any questions, particular of the 'How do I do X?' variety. As with most software, most users are only aware of a small fraction of what the software offers, and find it rewarding to have their repertoire extended. My usual experience with workshops of this kind is that there are many instances of wide-eyed looks which say "Ah, so THAT is how you do that!" Come prepared with any queries or reports you want to be able to do, but are not sure how, and we'll work out how together in the workshop.

Introduction to EXMARaLDA The workshop will introduce EXMARaLDA ("Extensible Markup Language for Discourse Annotation"), a system of concepts, data formats, and tools for the computer assisted transcription and annotation of spoken language, and for the construction and analysis of spoken language corpora. During the workshop three related tools will be introduced: (1) the Partitur Editor - a tool for inputting, editing, and outputting transcriptions in partitur (musical score) notation, (2) the Corpus Manager (CoMa) which is designed to merge transcripts created with the Partitur Editor with their corresponding recordings into corpora and to enrich them with metadata, and (3) the query tool EXAKT ("EXMARaLDA Analysis and Concordancing Tool") for searching transcribed and annotated phenomena in an EXMARaLDA corpus. After a brief introduction, the participants will have the chance to gain some practical experience with the tools. The focus will presumably be on the transcription and annotation of audio and/or video data in the Partitur Editor so please feel free to bring along your own data for testing. To find out more about EXMARaLDA visit http://www.exmaralda.org/en_index.html

Introduction to CHILDES The overall purpose of the session is to provide practical, hands-on experience of the CHILDES database and its tools for researchers working in any field of language acquisition. In particular, we aim: a) to introduce researchers unfamiliar with CHILDES, but planning to do empirical work, to the basics of transcription and coding of new and existing material and to the tools available to analyse data; b) to help researchers in addressing specific research questions within CHILDES (e.g. use of part-of-speech tagger, searches on morphosyntactic lines, etc).

Introduction to Unix for Corpus Users This workshop is intended for corpus users with little or no knowledge of the Unix command line who would like to extend their repertoire of searching, sorting, and synthesizing techniques beyond those that are available through the standard corpus-query software packages (SketchEngine, AntConc, Wordsmith, etc). The workshop will be divided

into three phases: a) Some baoptions, input & output, pipes, file management, aliases, .rc files b) The most useful Unix commands for corpus linguists: cat, grep, sed, sort, uniq (We will chain some of these together to create a customized word list with frequencies) . Some of these commands are integrated into the standard packages but by using them at the command line their range and flexibility can be greatly extended. This part of the workshop will also include a discussion of regular expressions. c) It is hoped to be able to demonstrate a simple Unix shellscript (program) which will convert batches of .doc and .pdf files to .txt , to aid participants in building their own corpora. This tool will be available to take away (or to be sent by email) at the end of the workshop.

-------------------------------------------------- Year 1 Tutor, SLAS Senior Lecturer in English Language and Linguistics

School of Languages and Area Studies University of Portsmouth Park Building King Henry I Street Portsmouth PO1 2DZ

Room 4.31, Tel. 023 92 846161 http://www.port.ac.uk/departments/academic/slas/staff/title,103868,en.html http://port.academia.edu/CharlotteTaylor

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 19088 bytes Desc: HTML URL: <https://mailman.uib.no/public/corpora/attachments/20120829/e20a49fe/attachment.txt>

---------------------------------------------------------------------- Send Corpora mailing list submissions to

corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit

http://mailman.uib.no/listinfo/corpora or, via email, send a message with subject or body 'help' to

corpora-request at uib.no

You can reach the person managing the list at

corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific than "Re: Contents of Corpora digest..."

_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora

End of Corpora Digest, Vol 62, Issue 26 ***************************************



More information about the Corpora mailing list