Nihil Obstat: Comparing baselines of keyword and learning based sentiment analysis
The goal of this post is to build a simple keyword-based sentiment analysis program based on SentiWordNet and evaluate it on the SFU Review Corpus, in order to compare its accuracy with the one obtained via (WEKA) learning as described in my previous post "Baseline Sentiment Analysis with WEKA".
Another interesting feature is that SentiWordNet researchers have provided us with a very basic Java class named
Read full article from Nihil Obstat: Comparing baselines of keyword and learning based sentiment analysis
The goal of this post is to build a simple keyword-based sentiment analysis program based on SentiWordNet and evaluate it on the SFU Review Corpus, in order to compare its accuracy with the one obtained via (WEKA) learning as described in my previous post "Baseline Sentiment Analysis with WEKA".
SentiWordNet is a collection of concepts (synonym sets, synsets) from WordNet that have been evaluated from the point of view of their polarity (if they convey a positive or a negative feeling). Some interesting features include:
- As it is based on WordNet, only English and the four most significant parts of speech (nouns, adjectives, adverbs and verbs) are covered. Multi-word expressions are included, encoded with underscore (e.g. "too_bad", "at_large").
- Each concept has attached polarity scores. For instance:
So SentiWordNet is in a tab-separated format, being the first column the Part Of Speech(POS), the second and third ones the polarity scores (between 0 and 1), the next column the synset (synonym set, list of synonyms tagged with their sense -- word#sense_number), and the last one the WordNet gloss (roughly speaking, the definition).
Another interesting feature is that SentiWordNet researchers have provided us with a very basic Java class named
SWN3.java
to query the database for a pair word/POS. This class loads the database and provides a function that outputs "positive
", "strong_positive
", "negative
", "strong_negative
" or "neutral
" for a given pair according to the manual scores assigned to the synsets. It is very basic because it does not perform Word Sense Disambiguation nor even POS Tagging, and the labels are heuristically defined (some other definitions are possible). However, we can take advantage of it in order to implement a very basic sentiment classifier,- Download a copy of SentiWordNet.
- Rename the file to
SentiWordNet_3.0.0.txt
and put it in adata
folder -- relative to the place you located yourSWN3.java
file. Alternatively, you can modify this class to use a different path or data file name. - Delete all lines starting with the symbol "
#
" from theSentiWordNet_3.0.0.txt
file. HINT: The header and the last line of the file.
Read full article from Nihil Obstat: Comparing baselines of keyword and learning based sentiment analysis
No comments:
Post a Comment