[Corpora-List] Multiple category assignement

John F Sowa sowa at bestweb.net
Mon Aug 26 21:16:33 CEST 2013

On 8/25/2013 10:55 AM, Aliabbas Petiwala wrote:
> So should such multiple categories be represented as bitstrings , such
> that for n categories there would be a whopping 2^n assignments ? This
> would surely make the inter annotator agreement (IAA) scores very low
> for minor differences.

You might consider Formal Concept Analysis (FCA), which automatically derives lattices from such bit strings. For references, software, and demos, see the FCA home page:


For examples, type any word to the demo for Roget's Thesaurus:


This will generate a small lattice of terms from Roget's Thesaurus to display the "concept neighborhood" of the word you submit.

You can try submitting the same words to the WordNet demo to see the differences in concept neighborhoods they generate:


If you represent annotations by bit strings that represent features or attributes, two strings that have minor differences will represent different "concepts", but they will have a common generalization in the lattices.

Depending on your application, this property might be an advantage rather than a disadvantage.


More information about the Corpora mailing list