[Corpora-List] congressional-speech dataset available

Lillian Lee llee at cs.cornell.edu
Thu Dec 14 07:21:01 CET 2006


The "congressional speech" corpus and associated graph information
used in our "Get out the vote: Determining support or opposition from
Congressional floor-debate transcripts" EMNLP 2006 paper is now
available.

Specifically, the data includes speeches as individual documents,
together with:

* automatically-derived labels for whether the speakers supported
the legislation under discussion or not, allowing for
experiments with this kind of sentiment analysis

* indications of which debate each speech comes from (and the
position within the debate), allowing for consideration of
conversational structure

* indications of by-name references between speakers, allowing for
experiments with agreement classification (if one determines the
"true" labels from the support/oppose labels assigned to the
pair of speakers in question)

* the edge weights and other information we derived to create the
graphs we used for our experiments upon this data, facilitating
implementation of alternative graph-based classification methods
upon the graphs we constructed

The download site is:
http://www.cs.cornell.edu/home/llee/data/convote.html

Matt Thomas, Bo Pang, and Lillian Lee







More information about the Corpora-archive mailing list