[Corpora-List] Corpora Digest, Vol 106, Issue 2

mariem boughdiri mari.angelgirl at hotmail.fr
Sat Apr 2 17:12:48 CEST 2016

good afternoon, please i want to ask you question:How to extract the distributional information or domain of an article using the title? thank you

________________________________________ De : corpora-bounces at uib.no <corpora-bounces at uib.no> de la part de corpora-request at uib.no <corpora-request at uib.no> Envoyé : samedi 2 avril 2016 03:00 À : corpora at uib.no Objet : Corpora Digest, Vol 106, Issue 2

Today's Topics:

1. First Call for Papers: FIRE 2016 (Parth Mehta)

2. DSALT: Distributional Semantics and Linguistic Theory 2nd

CFP (Gemma Boleda)

3. Final call for participation: 1st Translation Memory

Cleaning Shared Task (Orasan, Constantin)

4. CFP: PROPOR 2016 Student Research Workshop - Extended

Deadline (Pedro Paulo Balage)

5. Calling All Linguists: The Messaging Bots Need Your Help

(Lisa Michaud)

6. Re: Calling All Linguists: The Messaging Bots Need Your Help

(Angus B. Grieve-Smith)


Message: 1 Date: Fri, 1 Apr 2016 16:43:40 +0530 From: Parth Mehta <parth.mehta126 at gmail.com> Subject: [Corpora-List] First Call for Papers: FIRE 2016 To: undisclosed-recipients:;

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *Call for Papers*

FIRE 2016: Eighth meeting of the Forum for Information Retrieval Evaluation

8th - 10th December, Indian Statistical Institute, Kolkata

Submission Deadline: August 20, 2016

Website: fire.irsi.res.in


The 8th meeting of Forum for Information Retrieval Evaluation 2016 will be held in Indian Statistical Institute, Kolkata, India. Started in 2008 with the aim of building a South Asian counterpart for TREC, CLEF and NTCIR, FIRE has since evolved continuously to meet the new challenges in multilingual information access. It has expanded to include new domains like plagiarism detection, legal information access, mixed script information retrieval and spoken document retrieval to name a few.

Since 2015 FIRE started a peer reviewed conference track along with evaluation tasks. We are seeking the submission of high-quality and original full papers, short papers and demos. Submissions will be reviewed by experts on the basis of the originality of the work, the validity of the results, chosen methodology, writing quality and the overall contribution to the field of IR.

Short Paper submissions addressing any of the areas identified in the conference topics are also invited. Authors are encouraged to describe work in progress and late-breaking research results.

We also invite proposals for tutorials on recent advances in core IR and NLP research. They may focus on specific problems or specific domains in which IR/NLP research may be applied. Tutorials can be of half-day (3 hours plus breaks) duration. Tutorials are encouraged to be as interactive as possible. Tutorial proposals will be reviewed by the tutorial committee. A summary of the tutorial will be published in the conference proceedings.

======= TOPICS =======

Topics of interest include, but are not limited to:

* *IR Theory and Practice* - Searching, browsing, meta-searching - Data fusion, filtering and indexing - Language models, probabilistic IR, neural network based models - Learning to rank - Content classification, categorisation, clustering - Relevance feedback, query expansion, faceted retrieval - Topic detection and tracking, novelty detection - Recommender systems - Content-based filtering, collaborative filtering - Spam detection and filtering - Personalised, collaborative or user-adaptive IR - Adversarial IR - Privacy in IR - Contextual IR - Mobile, Geo and local search - Temporal IR, time-based modelling - Entity IR

* *Web and Social Media IR* - Link analysis - Query log analysis - Advertising and ad targeting - Spam detection - Trust, authority, reputation, ranking - Blog and online-community search, microblogs - Social search - Social tagging - Social networking and Web based communities - Trend identification and tracking - Time series and forecasting

* *User aspects* - User modelling, user studies, user interaction and history - Interactive IR - Task-based IR - Click models - Novel user interfaces for IR systems - Visualisation of queries, search results or content - Multimodal aspects, multimodal querying

* *IR system architectures*

- Distributed and peer to peer IR - Cloud IR - Federated IR - Aggregated Search - Fusion/Combination - Open, interoperable and flexible systems - Performance, scalability, efficiency - Architectures and platforms - Crawling and indexing - Compression, optimisation - Map/Reduce for IR

* *Content representation and processing* - IR for semi-structured documents - IR for semantically annotated collections, semantic search - Reasoning for IR - Meta information and structures, metadata - Query representation, query reformulation - Text categorisation and clustering - Text data mining - Opinion mining, sentiment analysis, argumentation mining - Cross-language retrieval, multilingual retrieval - Machine translation for IR - Question answering - Natural language processing - Summarization for IR

* *Evaluation* - Evaluation methods and metrics - Building test collections - Experimental design - Crowdsourcing for evaluation, human computing - User-oriented and user-centred test and evaluation - Metric comparison and evaluation - Offline vs online evaluation

* *Multimedia and cross-media IR* - Speech retrieval - Image and video retrieval - Entity retrieval - Digital music, radio and broadcast retrieval - Virtual reality and information access - Cross-modal processing and search

* *Applications* - Digital libraries - Enterprise and intranet search - Desktop search - Mobile IR - Genomic IR, IR for chemical structures - Medical IR - Legal IR, patent search - eScience - The Internet of Things

Submission Guidelines Detailed submission guidelines will soon be available at http://fire.irsi.res.in/fire/submission

*Important dates*

- July 30, 2016 ? Short Papers, Tutorial and Focused Panel proposals


- August 20, 2016 ? Full papers due

- Oct 15, 2016 ? Full papers, Short papers and Tutorial proposal

acceptance notifications

- Oct 31, 2016 ? Camera ready copies due

- Oct 31, 2016 ? Early bird Registration deadline

- Dec 8-10, 2016 ? Conference held in Kolkata, India

*Overall co-ordinators*

- Prasenjit Majumder (DA-IICT)

- Mandar Mitra (ISI Kolkata)

For queries related to conference please email us at [irlab at daiict.ac.in]

-- Regards, Parth Mehta DA-IICT -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6743 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160401/12545cca/attachment.txt>


Message: 2 Date: Fri, 1 Apr 2016 13:24:08 +0200 From: Gemma Boleda <gemma.boleda at upf.edu> Subject: [Corpora-List] DSALT: Distributional Semantics and Linguistic

Theory 2nd CFP To: corpora at uib.no

Second Call for Papers for

DSALT: Distributional Semantics and Linguistic Theory

ESSLLI 2016 Workshop

15-19 August 2016, Bolzano, Italy

* Two-page abstract submission deadline: April 7 2016 *

URL: http://esslli2016.unibz.it/?page_id=256


The DSALT workshop seeks to foster discussion at the intersection of distributional semantics and various subfields of theoretical linguistics, with the goal of boosting the impact of distributional semantics on linguistic research beyond lexical semantic phenomena, as well as broadening the empirical basis and theoretical tools used in linguistics. We welcome contributions regarding the theoretical interpretation of distributional vector spaces and/or their application to theoretical morphology, syntax, semantics, discourse, dialogue, and any other subfield of linguistics. Potential topics of interest include, among others:

* distributional semantics and morphology: How do results in the distributional semantics-morphology interface impact theoretical accounts of morphology? Can distributional models account for inflectional morphology? Can they shed light on phenomena like productivity and regularity?

* distributional semantics and syntax: How can compositionality at the semantic level interact with syntactic structure? Can we go beyond the state of the art in accounting for the syntax-semantics interface when it interacts with lexical semantics? How can distributional accounts for gradable syntactic phenomena, e.g. selectional preferences or argument alternations, be integrated into theoretical linguistic accounts?

* distributional semantics and formal semantics: How can distributional representations be related to the traditional components of a semantics for natural languages, especially reference and truth? Can distributional models be integrated with discourse- or dialogue-oriented semantic theories like file change semantics or inquisitive semantics?

* distributional semantics and discourse: Distributional semantics has shown to be able to model some aspects of discourse coherence at a global level (Landauer and Dumais 1997, a.o.); can it also help with other discourse-related phenomena, such as the choice of discourse particles, nominal and verbal anaphora, or the form of referring expressions as discourse unfolds?

* distributional semantics and dialogue: Distributional semantics has traditionally been mostly static, in the sense that it creates a semantic representation for a word once and for all. Can it be made dynamic so it can help model, for example, phenomena related to Questions Under Discussion (QUDs) in dialogue? Can distributional representations help predict the relations between utterance units in dialogue?

* distributional semantics and pragmatics: Distributional semantics is based on the statistics of language use, and therefore should include information related to pragmatics of language. How do distributional models relate to such aspects of pragmatics as focus, pragmatic presupposition, or conversational implicature?


We solicit two-page (plus references) abstracts in at most 11pt font. No proceedings will be published, so workshop submissions may discuss published work (as well as unpublished work). The abstract submission deadline is April 7, 2016. Submissions are accepted by email at dsalt2016 at gmail.com.


Deadline for abstract submission: April 7 2016 Author notification: May 15 2016 Workshop dates: August 15-19 2016


Marco Baroni (University of Trento) Katrin Erk (University of Texas at Austin) Aurélie Herbelot (University of Trento) Alessandro Lenci (University of Pisa) Jason Weston (Facebook)


Nicholas Asher, Marco Baroni, Emily Bender, Robin Cooper, Ann Copestake, Katrin Erk, Ed Greffenstette, Aurelie Herbelot, Alessandro Lenci, Marco Marelli, Louise McNally, Sebastian Pado, Barbara Partee, Laura Rimell, Mark Steedman, Bonnie Webber, Galit Weidman Sassoon, Roberto Zamparelli.


Gemma Boleda (University of Trento) Denis Paperno (University of Trento)

-- Gemma Boleda University of Trento http://gboleda.utcompling.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4746 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160401/0cc10953/attachment.txt>


Message: 3 Date: Fri, 1 Apr 2016 12:15:29 +0000 From: "Orasan, Constantin" <C.Orasan at wlv.ac.uk> Subject: [Corpora-List] Final call for participation: 1st Translation

Memory Cleaning Shared Task To: "CORPORA at UIB.NO" <CORPORA at UIB.NO>, "cluk at googlegroups.com"

<cluk at googlegroups.com>

Call for participation in the 1st Translation Memory Cleaning Shared Task organised at the 2nd Workshop on Natural Language Processing for Translation Memories (NLP4TM 2016)

to be held at LREC 2016 (Portoro?, Slovenia), May 28, 2016


The NLP4TM 2016 workshop proposes a shared task on cleaning translation memories. Participants in this task will be required to take pairs of source and target segments from translation memories and decide whether they are right translations. For the first task three language pairs have been prepared: EN-ES, EN-IT and EN-DE.

The data was annotated with information on whether the source and target content of each TM segment represent a valid translation. In particular, the following 3 point scale has been applied: (1) The translation is correct. (2) The translation is correct, but there are a few orthotypographic mistakes so some minor post-editing is required (3) The translation is not correct (content missing/added, wrong meaning, etc.).

The annotation guidelines are available on the task?s website. For each language pair, 2/3 of the annotated segments are provided for training and 1/3 will be provided for testing during the evaluation phase.

1. Tasks proposed

The participating teams can choose to participate in either or both of the following three tasks:

- Binary Classification (I) In this task, it is only required to determine whether a segment is right or wrong. For the first binary classification option, only tag (1) is considered correct because the translators do not need to make any modification, whilst tags (2) and (3) are considered wrong translations.

- Binary Classification (II) As in the first task, in this task it is only required to determine whether the segment is right or wrong. However, in contrast to the first task, a segment is considered correct if it was labelled by annotators as (1) or (2). Segments labelled (3) are considered wrong because they require major post-editing.

- Fine-grained Classification: In this task, the participating teams have to classify the segments according to the annotation provided in the training data: correct translations (1), correct translations with few orthotypographic errors (2), and wrong (3).

2. Submission and Evaluation information

Participants are required to register their intention to participate by filling in the following form before 8th April 2016: http://goo.gl/forms/ELStRtrw9J

The organisers will provide the training and test set to the participating teams and they will be asked to submit the output of their systems in a format similar to the training set. The exact modality and formatting of submissions will be communicated to participants at a later stage.

For evaluation, standard measures like precision, recall, f-measure will be used. In addition, the organisers may perform some manual error analysis. The extent of this analysis will depend on the number of systems submitted. For this reason, even though we do not plan to limit the numbers of runs submitted by participants, they will be required to indicate their primary (and secondary, if relevant) runs.

The participants are encouraged to release their systems and make them publicly available for future use. They are also encouraged not to use machine translation as one of the factors used to determine the class of a segment. This is because we are trying to encourage development of methods that can be run on large datasets without requiring a lot of computational resources.

In addition to submitting the output of their system, the participants will be asked to submit short contributions in the form of working notes describing their systems. They will be published on the workshop?s website and submissions that are not accompanied by a description will not be considered.

All systems will be presented in a demo session during the workshop.

3. Important dates

Release of training data: second week of February 2016 End of registration: 8th April 2016 Evaluation phase: 14th - 27th April 2016 Ranking of systems and release of the test set annotations: 4th May 2016 Submission of working notes: 16th May 2016 Workshop date: 28th May 2016

4. Organising committee

Eduard Barbu, Translated, Italy Carla Parra, Hermes, Spain Luca Mastrostefano, Translated, Italy Matteo Negri, FBK, Italy Marco Turchi, FBK, Italy Luisa Bentivogli, FBK, Italy Constantin Orasan, University of Wolverhampton, UK


Message: 4 Date: Fri, 1 Apr 2016 13:59:55 +0100 From: Pedro Paulo Balage <pedrobalage at gmail.com> Subject: [Corpora-List] CFP: PROPOR 2016 Student Research Workshop -

Extended Deadline To: nilc-l at icmc.usp.br, forum-lp at natura.di.uminho.pt,

ce-pln at grupos.ufrgs.br, corpora at uib.no,

naacl-latin-america at googlegroups.com, sbc-l at sbc.org.br Cc: Fernando Batista <fernando.batista.pt at gmail.com>, rede at appia.pt,

propor2016srw at gmail.com

(Apologies for multiple postings)

================================================== Call For Papers ================================================== PROPOR 2016 Student Research Workshop

13-15 July 2016 Tomar, Portugal


Extended Deadline for Submissions: April 15, 2016 (11:59pm GMT -12)

================================================== The Student Research Workshop (SRW) is held in conjunction with PROPOR 2016. The Workshop is designed to provide a venue for student researchers in Computational Linguistics and Natural Language Processing to present their work. Students will receive feedback from an experienced researcher in the field. The SRW invites two types of submissions:

Research Papers - completed work or works-in-progress along with preliminary results. We encourage submissions from masters students and advanced undergraduates in addition to from Ph.D. students. Thesis Proposal - for advanced students who have decided on a thesis topic and are interested in feedback about their proposal and ideas about future directions for their work.

================================================== Submission Criteria and Procedure:

? The document must focus on some aspect of written and spoken Portuguese processing and related issues. In general, any topic that falls within the scope of PROPOR call for papers (http://propor2016.di.fc.ul.pt/) is appropriate. ? The document should reflect the work produced by a student (Undergraduate, Masters or Ph.D.). The list of authors are not limited to students, but the first author must be a student. ? Documents must be submitted in English. ? Documents must be submitted in PDF format, following the same style of PROPOR 2016. Submissions are limited to 5 pages + 1 for references only, including all figures, tables, and should begin with an abstract of 250 words or less. ? Submission deadline is April 15, 2016 (extended). Entries must be submitted electronically (no hard copy submissions will be accepted) via the following link: https://easychair.org/conferences/?conf=proporsrw16 ? Notifications to authors will be sent on May 3, 2016. ? Accepted entrants must prepare an A0 poster (portrait format) to be presented at a special session.

Important Dates:

? April 15, 2016 - Submission deadline (extended) ? May 3, 2016 - Notification to authors ? May 30, 2016 - Deadline for camera-ready

================================================== Chairs of the Student Research Workshop:

Pedro Balage (Univ São Paulo, in São Carlos, Brazil) Fernando Batista (ISCTE-IUL, Portugal)

================================================== Contact Details:

For any inquiries regarding the workshop please send an email to propor2016srw at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6011 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160401/0b4c81c9/attachment.txt>


Message: 5 Date: Fri, 1 Apr 2016 09:27:42 -0400 From: Lisa Michaud <lisa.n.michaud at gmail.com> Subject: [Corpora-List] Calling All Linguists: The Messaging Bots Need

Your Help To: corpora at uib.no

An article by my friend and colleague. Messaging bots need more than just blind machine learning to drive their interactions:


-- Lisa N. Michaud lisa.n.michaud at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 698 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160401/66f7b37f/attachment.txt>


Message: 6 Date: Fri, 1 Apr 2016 09:54:13 -0400 From: "Angus B. Grieve-Smith" <grvsmth at panix.com> Subject: Re: [Corpora-List] Calling All Linguists: The Messaging Bots

Need Your Help To: corpora at uib.no

He's right, but why is he calling on linguists to do this? We've been ready to help for years - or at least I have, and I know several others.

It's the people building these bots that seem slow to recognize what linguists have to offer - and who think they can dispense with linguists if they have enough Machine Learning. So let's forward this call on to R&D managers and CS department chairs: You need linguists!

In that vein, anyone looking for a corpus linguist who also knows how to code, get in touch with me! http://grieve-smith.com/

On 4/1/2016 9:27 AM, Lisa Michaud wrote:
> An article by my friend and colleague. Messaging bots need more than
> just blind machine learning to drive their interactions:
> http://www.cmswire.com/digital-experience/calling-all-linguists-the-messaging-bots-need-help/
> --
> Lisa N. Michaud
> lisa.n.michaud at gmail.com <mailto:lisa.n.michaud at gmail.com>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


-Angus B. Grieve-Smith

grvsmth at panix.com

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2917 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160401/e8a83d77/attachment.txt>

---------------------------------------------------------------------- Send Corpora mailing list submissions to

corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit

http://mailman.uib.no/listinfo/corpora or, via email, send a message with subject or body 'help' to

corpora-request at uib.no

You can reach the person managing the list at

corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific than "Re: Contents of Corpora digest..."

_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora

End of Corpora Digest, Vol 106, Issue 2 ***************************************

More information about the Corpora mailing list