[Corpora-List] Does anybody know a classified faq collection?

Ling Yin Yin.Ling at itri.brighton.ac.uk
Wed May 18 13:15:01 CEST 2005

Dear Eric, Andy, Suzan, Debbie and many others,

Thank you very much for your information.

Further to Eric's question "Ling, can you name any other
researchers or research papers studying FAQs? This might help my case
for "releasing" our Leeds SoC FAQ." The answer is as follows:

There are many people who study automatically answering a
question by matching it against questions in a faq set. One example is
Robin Burke, Kristian Hammond and Julia Kozlovsky's FAQ Finder System.

In the question answering research community, people classifiy
questions into semantic classes such as "Time", "Name", "Number",
"Definition", etc. Refer to the webclopedia system by

In the study of statistical methods for question answering,
people usually use large collection of FAQs to train machines to learn
the relation between questions and answers. For example: Radu Soricut
and Eric Brill's system used a corpus of 1 million question/answer
pairs; Ittycheriah and Roukos's system used
4k question/answering pairs.

Thanks again!


-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of D Elliott
Sent: Tuesday, May 17, 2005 10:58 AM
To: Ling Yin
Subject: Re: [Corpora-List] Does anybody know a classified faq

Further to Eric Atwell's suggestion, I am also not aware of any FAQ
collection classified according to your preferred semantic classes, but
for my research into MT evaluation, I used texts from the Internet FAQ
Archives at:


Here you'll find an enormous number of FAQs listed by topic - A-Z. I
text from the site to create a million word corpus of FAQs on computer
software. But you'll also find anything from boats to bicycles, fashion
fetishes, tattoos to textiles.

(Thanks to Andy Roberts - also at Leeds - who directed me to this site)

Debbie Elliott
Computer Vision and Language Research Group,
School of Computing,
University of Leeds,
Leeds LS2 9JT
United Kingdom.
Tel: 0113 3437288
Email: debe at comp.leeds.ac.uk

More information about the Corpora-archive mailing list