[Corpora-List] Words ambiguous for sentiment

Keith Douglas Charles Stuart kstuart at idm.upv.es
Sun Apr 9 21:22:28 CEST 2017


Thanks to Valerio, Jorge and Ramesh. Sorry to be so slow getting back to you. The background to my question is that I am helping on a research project which involves developing software to carry out sentiment analysis. The project can be found here: http://tecnolengua.uma.es/?page_id=8

And a paper describing the software here: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/viewFile/5422/3186

The problem that I was referring to can be seen from two perspective or traditions. One is a more linguistic tradition (corpus linguistics) and the other a more NLP/Computer Science tradition (computational linguistics).

The corpus linguistics perspective probably starts with Firth: “You shall know a word by the company it keeps”. This is in line with Ramesh’s comments about meanings arising from contexts. I would certainly agree with this but context is a big word. Context may be a horizon (or window) of words 5 to the left or 5 to the right of a node word. It may be a dynamic window so context can extend a lot further. It may be the whole document or the meaning of the word may be impacted on by the genre or register. Word meanings can predict context and be predicted by context.

Of course, a parser (or a POS tagger) can pick up (as implied by Jorge) that “smashing” is an adjective in one case and a verb in another. So, depending on its local grammar, the evaluative meaning of a word may change in the sense of the link between specific language patterns and evaluation as proposed in Hunston & Sinclair (2000).

As Sinclair showed many years ago, words like ‘back’ have many multiple senses and can be found in negative idioms such as “back the wrong horse”, “back against the wall” etc. but also in positive idioms such as “back on track”. So, as Valerio has suggested, one could use synsets where, for different meanings of the same word, a different polarity score is given. They introduce the term polypathy as the property of a word of having different senses spread apart on the polarity scale (and the polypathy of a lemma being calculated as the standard deviation of the polarity scores of the possible senses of the lemma). This solution is lexicon-based and you need to plug in to the software system the positive and negative scores in SentiWordNet.

I was thinking more on the lines of research carried out in the computational linguistics tradition. Recent vector-based models (for example, https://code.google.com/archive/p/word2vec/) can capture the rich relational structure of the lexicon by encoding distributed numerical representations of word features, features such as the context of individual words. They take a text corpus as input and produce the word vectors as output. So, what I would be interested in working on, at least theoretically, would be how to use the vector representation of words to predict the sentiment annotations on the words from the contexts in which they appear.

Best,

Keith



More information about the Corpora mailing list