[Corpora-List] Finding representative terms

radev at umich.edu radev at umich.edu
Tue Dec 27 03:35:00 CET 2005


You should consider using TF*IDF instead of IDF. First, compute IDF
from a large external corpus. Then, compute TF for each of the words
in each of your input documents. A typical outcome would be:

IDF TF TF*IDF
the 0.01 20 0.20
today 1.00 2 2.00
Paris 5.00 2 10.00

Drago

Delip Rao wrote:

>

> Hi,

>

> Is there any work that tries to find the most

> important/representative words from a document? I have

> tried using IDF but results were very poor. Also IDF

> does not make sense if we have a single document and

> want to get the most important term(s) out of it.

>

> Thanks!

> Delip

>

>

>

> __________________________________

> Meet your soulmate!

> Yahoo! Asia presents Meetic - where millions of singles gather

> http://asia.yahoo.com/meetic

>

>

>

>



--
Dragomir R. Radev radev at umich.edu
Associate Professor of Information, Electrical Engineering and
Computer Science, and Linguistics, the University of Michigan, Ann Arbor
Phone: 734-615-5225 Fax: 734-764-2475 http://www.si.umich.edu/~radev





More information about the Corpora-archive mailing list