[Corpora-List] Multiple category assignement

John F Sowa sowa at bestweb.net
Mon Aug 26 21:16:33 CEST 2013


On 8/25/2013 10:55 AM, Aliabbas Petiwala wrote:
> So should such multiple categories be represented as bitstrings , such
> that for n categories there would be a whopping 2^n assignments ? This
> would surely make the inter annotator agreement (IAA) scores very low
> for minor differences.

You might consider Formal Concept Analysis (FCA), which automatically derives lattices from such bit strings. For references, software, and demos, see the FCA home page:

http://www.upriss.org.uk/fca/fca.html

For examples, type any word to the demo for Roget's Thesaurus:

http://www.ketlab.org.uk/roget.html

This will generate a small lattice of terms from Roget's Thesaurus to display the "concept neighborhood" of the word you submit.

You can try submitting the same words to the WordNet demo to see the differences in concept neighborhoods they generate:

http://www.ketlab.org.uk/wordnet.html

If you represent annotations by bit strings that represent features or attributes, two strings that have minor differences will represent different "concepts", but they will have a common generalization in the lattices.

Depending on your application, this property might be an advantage rather than a disadvantage.

John



More information about the Corpora mailing list