Danilo Giampiccolo giampiccolo at itc.it
Wed Dec 20 13:01:01 CET 2006

Apologies for cross-postings.



Textual entailment recognition, i.e. the task of deciding, given two
texts, whether the meaning of one text can be plausibly inferred from
the other, has gained growing popularity recently following the two
previous rounds of the PASCAL Recognizing Textual Entailment (RTE)
challenge. One of the key elements of this success is probably the fact
that textual entailment may serve as a unifying generic framework for
applied modeling of semantic inference, and captures generically a broad
range of inferences that are relevant for different application, such as
Question Answering (QA), Information Extraction (IE), Summarization,
Machine Translation, paraphrasing, and for certain types of queries in
Information Retrieval (IR). More specifically, the RTE challenge aims to
focus research and evaluation on this shared underlying semantic
inference task and isolate it from other application specific problems.

The goal of the first RTE challenge was to provide a new benchmark to
test progress in recognizing textual entailment, and to compare the
achievements of different groups. This goal has proven to be of great
interest, and the community response encouraged us to gradually extend
the scope of the original task. The second RTE Challenge built on the
success of the first, with 23 participating groups from around the world
(as compared to 17 for the first challenge). The number of participants
and their contributions to the discussion at the Workshop in April 2006
(Venice, Italy) demonstrated that Textual Entailment is a quickly
growing field of NLP research. Already, the workshops have spawned an
impressive number of publications in major conferences, with more work
in progress and about 150 downloads to date of the RTE-2 dataset (see
http://aclweb.org/aclwiki/index.php?title=Textual_Entailment for a
comprehensive reference list).

RTE 3 follows the same basic structure of the previous campaign, in
order to facilitate the participation of newcomers and to allow
"veterans" to assess the improvements of their systems. Nevertheless, a
couple of innovations are introduced:

* A limited number (about 20%) of longer texts - i.e. one
paragraph long - are introduced as a first step towards addressing
broader settings which require discourse analysis.
* An RTE Resource Pool has been created as a shared central
location for resource contributors and users (see below).

The input to the challenge task consists of pairs of text units, termed
T(ext) - the entailing text, and H(ypothesis) - the candidate entailed
text. The task consists of recognizing a directional relation between
the two text fragments, deciding whether T entails H or not. More
specifically, we say that T entails H if, typically, a human reading T
would infer that H is most likely true. System results will be compared
to a human-annotated gold-standard test set.

The following H/T pairs exemplify the task proposed in the challenge:

T: The flights begin at San Diego's Lindbergh Field in April, 2002 and
follow the Lone Eagle's 1927 flight plan to St. Louis, New York, and
H: Lindbergh began his flight from Paris to New York in 2002.

T: The world will never forget the epic flight of Charles Lindbergh
across the Atlantic from New York to Paris in May 1927, a feat still
regarded as one of the greatest in aviation history.
H: Lindbergh began his flight from New York to Paris in 1927.

T: Medical science indicates increased risks of tumors, cancer, genetic
damage and other health problems from the use of cell phones.
H: Cell phones pose health risks.

T: The available scientific reports do not show that any health problems
are associated with the use of wireless phones.
H: Cell phones pose health risks.

The development and test sets are based on multiple data sources and are
intended to be representative of typical problems encountered by applied
text understanding systems. Examples are mostly based on entailment
cases that were/were not handled successfully by existing systems, and
also include a small proportion of manually created examples that
simulate an application scenario. While most of the text pairs are drawn
from the domain of political and business news, other domains, such as
sports, science, and technology are also represented, even though any
domain-specific language is avoided and the vocabulary used is that of
an average educated person.

As in RTE-2, data types corresponding to the following application areas
are used (see website for details on mapping application data to an RTE

a. Question Answering (QA)
b. "Propositional" Information Retrieval (IR)
c. Information Extraction/Relation Extraction (IE)
d. Summarization (SUM) (including PYRAMID-based data)

This year, a limited proportion of longer texts - up to a short
paragraph - are included, allowing for discourse analysis. However, the
majority of examples remain similar to those in the previous challenges,
providing pairs with relatively short texts.

In order to avoid copyright problems, data is limited to either what has
already been publicly released by official competitions or else is drawn
from freely available sources such as Wikinews and Wikipedia.


One of the key conclusions at the 2nd RTE Challenge Workshop was that
entailment modeling requires vast knowledge resources that correspond to
different types of entailment reasoning. Examples of useful knowledge
include ontological and lexical relationships, paraphrases and
entailment rules, meaning entailing syntactic transformations and
certain types of world knowledge. Textual entailment systems also
utilize general NLP tools such as POS taggers, parsers and named-entity
recognizers, sometimes posing specialized requirements to such tools.
With so many resources being continuously released and improved, it can
be difficult to know which particular resource to use when developing a
system. In response, RTE-3 includes a new activity for building a
Textual Entailment Resource Pool, which will serve as a portal and forum
for publicizing and tracking resources and reporting on their use.

We actively solicit both RTE participants and other members of the NLP
community who develop or use relevant resources to contribute to the
Textual Entailment Resource Pool. Contributions include links and
descriptions of relevant resources as well as informational postings
regarding resource use and accumulated experience. RTE-3 participants
who utilize such resources are expected to cite them and evaluate their
impact while the overall utility of noticeable resources will be
reviewed in the RTE-3 organizers paper, which we hope will reward
contributors of useful resources.

The Textual Entailment Resource Pool is hosted as a sub-zone of the ACL
Wiki for Computational Linguistics. The resource pool has been seeded
with a few resources, however its usefulness relies on the community's
(including your!) contributions. The Textual Entailment Resource Pool is
available at


We would like to draw attention to a preliminary announcement, which is
currently being circulated, for a special issue of the Journal for
Natural Language Engineering on Textual Entailment. The call for the
special issue is anticipated for April 2007 with submission deadline
several months later. This schedule will allow interested participants
of RTE-3 to report their recent results, following the RTE-3 workshop.
The call for the special issue will be open, covering a broader scope
than exhibited in the RTE challenge.


Development Set Release: 20 December 2006
Test Set Release: 1 March 2007
Deadline for participants' submissions: 12 March 2007
Release of individual results: 16 March 2007
Deadline for participants' reports: 2 April 2007
Camera-ready version of reports: 9 May 2007
Workshop: Early summer,
(We have proposed having the RTE-3 workshop as an ACL 2007 workshop, to
be held at the end of June in Prague.)

Danilo Giampiccolo, CELCT (Trento), Italy (coordinator)
Bernardo Magnini, ITC-irst (Trento), Italy (advisor)
Ido Dagan, Bar Ilan University, Israel (supervisor and advisor)
Bill Dolan, Microsoft Research, USA
Patrick Pantel, ISI, USA (Textual Entailment Resource Pool)

The preparation and running of this challenge has been supported by the
EU-funded PASCAL Network of Excellence on Pattern Analysis, Statistical
Modeling and Computational Learning.

The data sets have been created and annotated by the Butler Hill Group
(Microsoft) and CELCT.


For registration, further information and inquiries, please visit the
challenge website:


CONTACT: Danilo Giampiccolo <info at celct.it>, with [RTE3] in the
subject header.

More information about the Corpora-archive mailing list