Let me know when you get in John ________________________________________ From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of corpora-request at uib.no [corpora-request at uib.no] Sent: 18 May 2012 18:00 To: corpora at uib.no Subject: Corpora Digest, Vol 59, Issue 21
1. Re: bilingual labeled corpora (Germán Sanchis Trilles)
2. Biomedical Text mining Position available (job / PhD
/Post-doc position announcement) (Martin Krallinger)
3. Ph.D. scolarship (Christer.Johansson at lle.uib.no)
4. Fully funded PhD studentships at the University of Brighton
(R.M.Salkie at brighton.ac.uk)
5. Language Resources for Public Security Applications Workshop
- reminder (Language and Technology Conference)
6. NAACL-HLT 2012 Last Call for Participation (Smaranda Muresan)
7. Re: A doubt concerning posting in Corpora Service.
(Juan Antonio Sabariego)
Message: 1 Date: Thu, 17 May 2012 12:27:19 +0200 (CEST) From: Germán Sanchis Trilles <gsanchis at dsic.upv.es> Subject: Re: [Corpora-List] bilingual labeled corpora To: Ralf Steinberger <ralf.steinberger at jrc.ec.europa.eu> Cc: CORPORA at uib.no
thank you very much for the information. I am looking into the corpora you
pointed out, and I think they will actually be very useful for my research
On Wed, 16 May 2012, Ralf Steinberger wrote:
> Dear Germán,
> We just released (today - the email is on its way!) a multi-label classification tool which has been trained for 22 languages and which comes with manually annotated topic descriptors, drawn from the EuroVoc thesaurus. The multi-label annotation is at document level. There are between twenty and forty thousand documents per language.
> You can find it at http://langtech.jrc.ec.europa.eu/Eurovoc.html .
> Maybe this corpus is useful for you.
> Should you be seeking for individual aligned sentences, then may be the DGT-Translation Memory DGT-TM is what you are looking for. While the sentences in DGT-TM are not individually annotated, they are accompanied by a document identifier so that - with a bit of effort - you can retrieve the EuroVoc descriptors for these documents. DGT-TM exists in the same 22 languages and is downloadable from http://langtech.jrc.ec.europa.eu/DGT-TM.html .
> Ralf Steinberger (Ralf.Steinberger at jrc.ec.europa.eu)
> European Commission ? Joint Research Centre (JRC)
> IPSC ? GlobeSec ? OPTIMA
> URL ? Applications: http://emm.newsbrief.eu/overview.html
> URL ? The science behind them: http://langtech.jrc.ec.europa.eu
> T.P. 267, Via E. Fermi 2749
> 21027 Ispra (VA), Italy
> Tel: +39 0332 78-6271
> Fax: +39 0332 78-5154
> Secretary: +39 0332 78-5648 or 9478
> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Germán Sanchis Trilles
> Sent: 16 May 2012 12:56
> To: CORPORA at uib.no
> Subject: [Corpora-List] bilingual labeled corpora
> Dear list,
> for performing some SMT experiments I would require some kind of bilingual corpora, presenting different kind of annotations, such as topic or dialog act labels (or other kinds of labels). Does anyone know about such corpora?
> Thanks in advance,
> best regards,
> Germán Sanchis-Trilles
Message: 2 Date: Thu, 17 May 2012 18:25:20 +0200 From: "Martin Krallinger" <mkrallinger at cnio.es> Subject: [Corpora-List] Biomedical Text mining Position available (job
/ PhD /Post-doc position announcement) To: <corpora at uib.no>
Title: Biomedical Text mining Position available (job / PhD / Post-doc position announcement)
Several types of contracts could be offered in our research group, including Post-doctoral, PhD or post-graduate positions. Salaries will depend on the type of position, expertise and academic formation. Working language is English. Lab URL: http://www.cnio.es/ing/grupos/plantillas/presentacion.asp?grupo=50004294 Publication record (Alfonso Valencia): http://scholar.google.es/citations?user=4iB725QAAAAJ&hl=en
General description: The candidate will work in a multidisciplinary team dealing with the development and application of biomedical text mining and natural language processing techniques. The overall aim of this work is to develop and apply text mining and natural language processing technologies to biomedical literature, covering aspects related to automatic text classification using machine learning methods, the detection of entities of biological interest from text and the extraction and ranking of biological relations from the biomedical literature. A special focus will be given to certain topics including: cancer-relevant gene detection and relationship extraction.
Requirements: (1) Applicants should have a solid formation in computational linguistics, Natural Language Processing, text mining or related areas. (2) Ability to develop algorithms and software for needed by natural language processing/text mining systems (3) Programming skills are required, in at least one of the following languages (Python, Perl, Java, C/C++, Ruby). (4) Good English communication skills. (5) Interest in the Biomedical field.
The expertise in the following points are a plus when applying for the position: - Formation on topics related to statistics and machine learning. - Development of online web applications. - Previous experience with biomedical texts. - Ability to work in an interdisciplinary team. - A good scientific publication record would be an advantage. - Familiarity with NLP tasks such as named entity recognition, information extraction and information retrieval.
Background on our research group: The position is available in the group of Dr. Alfonso Valencia, Director of the Structural Biology and Biocomputing Programme at the Spanish National Cancer Research Centre. The research group contributed significantly to the biomedical text mining research over the past years, from initial work related to the analysis of protein families, microarray data and protein interactions to the development of online applications such as the iHOP server, PLAN2L or the BioCreative Metaserver. The group collaborates with experimental biomedicine labs to integrate NLP and text mining data with the results of bioinformatics data. It has been co-organizing community evaluation efforts in the BioNLP area, i.e. the BioCreative challenges.
Contact info: Requests for additional information or formal applications (including application letters, extensive CV and PhD/MA thesis description) can be sent to Martin Krallinger:mkrallinger at cnio.es
-------------------------------------------------- Martin Krallinger Structural Computational Biology Group Structural Biology and BioComputing Programme Spanish National Cancer Research Centre (CNIO) --------------------------------------------------
**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.
Message: 3 Date: Mon, 14 May 2012 14:21:14 +0200 From: Christer.Johansson at lle.uib.no Subject: [Corpora-List] Ph.D. scolarship To: corpora at uib.no
PhD Research Fellowships in the Department of linguistic, literary, and aesthetic studies
A PhD fellowship is available at the Department of linguistic, literary, and aesthetic studies from 1 September 2012.
The fellowship is within the research project CATO (Contextual Aspects of Text Organization) which is funded by the Norwegian Research Council. CATO is about writing flow, and how writing flow can be supported in future technical aids for weak writers. The applicant's work will be within the specifications of this project. The research is conducted in cooperation with professor Christer Johansson (UiB) and associate professor Per Henning Uppstad (ReadingCenter at UiS).
Message: 4 Date: Tue, 15 May 2012 15:39:40 +0000 From: R.M.Salkie at brighton.ac.uk Subject: [Corpora-List] Fully funded PhD studentships at the
University of Brighton To: "CORPORA at UIB.NO" <CORPORA at UIB.NO>
[Note deadline: 8 June 2012]
The University of Brighton's Doctoral College invites applications from around the world for one of up to 40 new PhD studentships available for entry during the 2012/2013 academic year.
Two of the studentships are in linguistics - see below.
Full details: http://www.brighton.ac.uk/researchstudy/2012studentships/ Advertisement in Times Higher: http://www.timeshighereducation.co.uk/jobs_jobdetails.asp?ac=93409
Each studentship is worth £55,650 and will support full-time study over a three-year period, including £14,300 per year towards living expenses.
Modality in English and the semantics/pragmatics interface http://www.brighton.ac.uk/researchstudy/2012studentships/arts-and-humanities/modality-in-english-and-the-semanticspragmatics-interface
The future of the languages of Europe (contrastive linguistics, English and German) http://www.brighton.ac.uk/researchstudy/2012studentships/arts-and-humanities/the-future-of-the-languages-of-europe-contrastive-linguistics-english-and-german
These urls need to be on a single line. You can also find these topics easily via the Full details page above.
For informal discussion about the linguistics studentships, email Professor Raphael Salkie (r.m.salkie at bton.ac.uk).
For general enquiries about the scheme, or to be put in touch with a supervisor, please contact the Brighton Doctoral College by email at: doctoralcollegedean at brighton.ac.uk
Professor Raphael Salkie, School of Humanities, University of Brighton Falmer, Brighton, BN1 9PH England.
Fax: (+44) 01273 641873 Email: r.m.salkie at brighton.ac.uk <mailto:r.m.salkie at brighton.ac.uk>
Home page: http://arts.brighton.ac.uk/staff/raf-salkie
___________________________________________________________ This email has been scanned by MessageLabs' Email Security System on behalf of the University of Brighton. For more information see http://www.brighton.ac.uk/is/spam/ ___________________________________________________________ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6412 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120515/74760089/attachment.txt>
Message: 5 Date: Thu, 17 May 2012 00:47:49 +0200 (CEST) From: ltc at amu.edu.pl (Language and Technology Conference) Subject: [Corpora-List] Language Resources for Public Security
Applications Workshop - reminder To: corpora at hd.uib.no
Dear Sir or Madam
This is to invite you to participate in the Language Resources for Public Security Applications Workshop (LRPS 2012) at LREC 2012. The workshop will take place in Istanbul (Turkey), on May 27th, 2012. More information regardig the workshop can be found at http://www.lrps.amu.edu.pl. We also recommend to visit the LREC 2012 site at http://www.lrec-conf.org/lrec2012/.
Best regards, LRPS Organizers
Message: 6 Date: Thu, 17 May 2012 18:29:06 -0400 From: Smaranda Muresan <smuresan at rci.rutgers.edu> Subject: [Corpora-List] NAACL-HLT 2012 Last Call for Participation To: liste_acl <acl at aclweb.org>, liste_isca <publ at isca-speech.org>,
liste_corpora <corpora at uib.no>, liste_bionlp
<bionlp at lists.ccs.neu.edu>, liste_elsnet <elsnet-list at elsnet.org>,
liste_sigsem <sigsem at aclweb.org>, irlist at lists.shef.ac.uk
============================================================================================== NAACL-HLT 2012 LAST CALL FOR PARTICIPATION
North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HTL 2012) June 3 ? June 8, 2012, Montreal, Canada http://naaclhlt2012.org
Early Registration Closed Late Registration Closes May 23 at 11:59pm East Coast Time
Registration: http://www.naaclhlt2012.org/registration/registration.php Hotel Room Reservation: http://www.naaclhlt2012.org/participants/accomodation.php
The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012) will be held June 3 - 8, 2012 at Le Centre Sheraton Montréal 1201, boul. René-Lévesque ouest, Montréal, (Québec), Canada.
The conference will include three days of technical presentations (June 4-6), one day of tutorials (June 3), and two days of workshops (June 7-8). In addition, this year the conference will be co-located with the First Joint Conference on Lexical and Computational Semantics (June 7-8).
[NAACL-HTL CONFERENCE PROGRAM] http://www.naaclhlt2012.org/conference/conference.php
* Eduard Hovy, Director of the Human Language Technology Group, Information Sciences Institute of
the University of Southern California * James W. Pennebaker, Centennial Liberal Arts Professor and Chair of Psychology at the University
of Texas at Austin
[BEST PAPER AWARDS]
Best Full Paper Award: Vine Pruning for Efficient Multi-Pass Dependency Parsing: Alexander Rush and Slav Petrov Best Short Paper Award: Trait-Based Hypothesis Selection For Machine Translation: Jacob Devlin and Spyros Matsoukas IBM Best Student Paper Award: Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure: Oscar Täckström, Ryan McDonald, Jakob Uszkoreit
[TUTORIALS] June 3 (http://www.naaclhlt2012.org/conference/tutorials.php)
* 100 Things You Always Wanted to Know about Linguistics But Were Afraid to Ask* (Emily M. Bender) * Structured Sparsity in Natural Language Processing: Models, Algorithms and Applications (André F. T. Martins, Mário A. T. Figueiredo, and Noah A. Smith) * Arabic Dialect Processing Tutorial (Mona Diab and Nizar Habash) * Natural Language Processing in Watson (Alfio M. Gliozzo, Aditya Kalyanpur, James Fan) * Variational Inference for Structured NLP Models (David Burkett, Dan Klein) * Processing modality and negation (Roser Morante) * On-Demand Distributional Semantic Distance and Paraphrasing (Yuval Marton) * Predicting Structures in NLP: Constrained Conditional Models and Integer Linear Programming NLP (Dan Goldwasser, Vivek Srikumar, Dan Roth)
[WORKSHOPS] NAACL-HLT 2012 features an expanded workshop program, with 16 workshops over two days (June 7-8). The workshops are:
*Cognitive Modeling and Computational Linguistics *Future directions and needs in the Spoken Dialog Community *Induction of Linguistic Structure *Innovative Use of NLP for Building Educational Applications *Language in Social Media *Predicting and improving text readability *Twelfth Meeting of the ACL-SIGMORPHON Computational Research in Phonetics, Phonology, and Morphology *BioNLP *Computational Linguistics for Literature *Evaluation metrics and system comparison for automatic summarization *Future of Language Modeling for HL *Semantic Interpretation in an Actionable Context *Syntactic Analysis of Non-Canonical Language *Speech and Language Processing for Assistive Technologies *Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX) *Statistical Machine Translation
For more information about the workshops visit: http://www.naaclhlt2012.org/conference/ws.php
[CONFERENCE VENUE] NAACL-HTL 2012 will be held at Le Centre Sheraton Montréal 1201, boul. René-Lévesque ouest, Montréal, (Québec), Canada. The negotiated Conference discount rate expired at May 11, 2012. Due to the Grand Prix immediately following our event, the hotel is almost sold out. However, a few Club Level rooms are still available and we have negotiated a discounted rate of $280 per night based on availability. Guests can access the site to book, modify, or cancel a reservation from May 11, 2012 to June 12, 2012. Simply follow https://www.starwoodmeeting.com/StarGroupsWeb/res?id=1205117094&key=322B
[REGISTRATION] Early Registration Closed Late Registration Closes May 23 at 11:59pm East Coast Time Registration: http://www.naaclhlt2012.org/registration/registration.php
We hope to see you in Montreal!
General Conference Chair Jennifer Chu-Carroll, IBM
Program Co-Chairs Srinivas Bangalore, AT&T Eric Fosler-Lussier, The Ohio State University Ellen Riloff, University of Utah
Local Arrangements Chair: Priscilla Rasmussen, ACL Business Office, acl-AT-aclweb.org Advisory committee:
Sabine Bergler, Concordia University
Guy Lapalme, Université de Montréal
Workshops Co-Chairs: Colin Cherry, National Research Council of Canada Mona Diab, Columbia University
Tutorials Co-Chairs: Jacob Eisenstein, CMU Radu Florian, IBM T.J. Watson Research Center
Demos Co-Chairs: Aria Haghighi, Prismatic Yaser Al-Onaizan, IBM T.J. Watson Research Center
Student Workshop Co-chairs: Rivka Levitan, Columbia University Myle Ott, Cornell University Faculty Advisors: Roger Levy, UCSD Ani Nenkova, University of Pennsylvania
Publications: Nizar Habash, Columbia University William Schuler, OSU
Publicity: Smaranda Muresan, Rutgers University
Exhibits: Joel Tetreault, Education Testing Services
Webmaster: Dirk Hovy, USC/ISI
Smaranda Muresan Assistant Professor Library and Information Science Department School of Communication and Information Rutgers University 4 Hungtington St New Brunswick, NJ, 08901 smuresan at rci.rutgers.edu
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 9546 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120517/c8ecaee0/attachment.txt>
Message: 7 Date: Fri, 18 May 2012 00:30:22 +0200 From: Juan Antonio Sabariego <j.a.sabariego at gmail.com> Subject: Re: [Corpora-List] A doubt concerning posting in Corpora
Service. To: corpora at uib.no
Dear members of CorporaList,
Some member of the Universitat Pompeu Fabra pragmatics groups are working on a project about evidential and epistemic markers in five different languages, namely Spanish, Catalan, English, French and German. We would be very thankful if you could offer some help about the searching of oral conversational corpora in those five languages, already transcribed, for the purpose of our investigation. We have found some of them but they are quite limited for our investigation or we have to pay for them, we wanted first to have a look on the free versions. We guess you can draw a clear panorama of the situation of oral corpora in these three languages. Thank you in advance.
Juan Antonio Chica Sabariego Universitat Pompeu Fabra
On Thu, May 17, 2012 at 10:57 PM, Knut Hofland <Knut.Hofland at uni.no> wrote:
> If you want to post to the list, use the address corpora at uib.no
> Best regards
> Knut Hofland
> On Tue, 15 May 2012, Juan Antonio Sabariego wrote:
> Some members of the Universitat Pompeu Fabra pragmatics group are working
>> on a project about evidential and epistemic units in five different
>> namely Spanish, Catalan, English, French and German. We would be much
>> pleased if you could offer some help about oral conversational corpora,
>> transcribed, for the purpose of our investigation. I have found some of
>> them but they are quite limited for our investigation or paying. Thank you
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2175 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120518/60059db5/attachment.txt>
---------------------------------------------------------------------- Send Corpora mailing list submissions to
corpora at uib.no
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.uib.no/listinfo/corpora or, via email, send a message with subject or body 'help' to
corpora-request at uib.no
You can reach the person managing the list at
corpora-owner at uib.no
When replying, please edit your Subject line so it is more specific than "Re: Contents of Corpora digest..."
_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora
End of Corpora Digest, Vol 59, Issue 21 ***************************************
This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.
Please do not use, copy or disclose the information contained in this message or in any attachment.
Any views or opinions expressed by the author of this email do not necessarily reflect the views of The University of Nottingham Ningbo, China.
This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks.
Email communications with The University of Nottingham Ningbo, China may be monitored as permitted by UK and Chinese legislation.