Radian6 Sentiment Analysis Review – Does Natural Language Processing Work?

Radian6, one of the primary social media monitoring tools we use at Ignite, recently launched a new feature: automatic sentiment analysis. As Christ Newton, Radian6 Founder and CTO, explains in his blog post: “Radian6 automated sentiment reviews on-topic posts as they come in, determines the sentiment of the post at the sentence level, and aggregates a positive, negative, or neutral designation at the post level based on specified sentiment keywords and phrases.”

I should mention that Radian6 continues to be one of our primary tools at Ignite. The noise filtering is second to none, and we should know – we are tracking some very “noisy” keywords for our clients. So, while we consider the addition of sentiment analysis to Radian6 to be a great bonus, we would have continued to use their service even without it.

Opening Skepticism

Our social monitoring guru at Ignite, Brian Chappell, has often questioned the accuracy, and therefore, usefulness, of automatic sentiment analysis, typically done with what’s called Natural Language Processing. Without a trained human actually reading each and every post, the ability of these systems to accurately mark a post as Positive, Negative or Neutral in regard to a specific brand or product mention is not very strong. However, the goals of sentiment analysis and trending the general sentiment about a brand or product over time is a very worthy one, especially for a social media agency like Ignite as we try to move the relevant needles for our clients and report on that growth.

Sentiment Accuracy

I decided to run a test to see how Radian6’s system performed. For this test, I chose a Radian6 Topic Profile (i.e., a set of keyphrases around a single brand) that we have been tracking for a while. First, I reviewed the accuracy of the automatic sentiment analysis from a very small time period. From 12/17/09 – 12/18/09 there were: 420 mentions pulled in by Radian6 across all mediums for this topic profile. The breakdown across mediums is seen in the chart R6 Graph1 Of the 420 mentions, 325 (77%) were not marked with sentiment (i.e., neutral), leaving 95 that were marked with sentiment:

  • 51 were marked as Positive
  • 44 were marked as Negative
  • No posts were marked as Mixed, Somewhat Positive or Somewhat Negative

I then went through and read all of the 95 sentimented posts to check if the sentiment marking was correct. The results left a total of only 34 true, sentimented posts (i.e., only 36% of those automatically marked with sentiment actually seemed to have any measurable sentiment). Of those:

  • 22 Positive
  • 10 Somewhat Positive
  • 0 Mixed
  • 2 Somewhat Negative
  • 0 Negative
Note 1: I only reviewed the posts that had been automatically marked with sentiment (i.e. not the supposed Neutral posts), so I don’t know how many of these Neutral posts should have actually been marked with Positive or Negative sentiment.

Note 2: My criteria for marking sentiment were fairly basic. I did not have anyone else read through and confirm my analysis, but I feel confident that if someone did, we would not have varied much (at least not among the 3 big categories of Positive, Neutral and Negative).

One of the most striking changes was the vast majority of the 44 posts that were originally marked as Negative turned out to be Neutral (and some were actually Positive). Many of these were retweets that included an @message to a Twitter user with the handle “@MyChaos”. From what I can tell, the word “chaos” is the only possible reason why these tweets were marked with Negative sentiment. Hopefully, the automatic sentiment analysis will be improved to recognize a Twitter username (i.e. @user_name) and not include this when analyzing for sentiment. It is possible that the negative sentiment was caused for another reason, but if so, it must be even stranger than the @MyChaos causing a Negative marking.

Where originally the Positive to Negative ratio was 1.16 : 1, the new ratio after a human review was 16 : 1 – an order of magnitude difference.

In my analysis of the sentimented posts, the posts marked as positive were much more likely to be marked correctly versus the posts marked as negative. This may have a lot to do with the specific Radian6 Topic Profile (i.e., brand) I was analyzing. Anecdotal evidence shows clearly that this brand generally receives very favorable mentions. Therefore, it’s likely that a brand with a more negative consumer perception will have more ‘false positives’ as opposed to ‘false negatives’.

Sentiment Distribution across Mediums

It is also relevant to note what mediums supply the most sentimented posts. Therefore, I ran a three month test (the longest that Radian6 will allow) from 9/16/09 – 12/16/09 on the same Topic Profile. Here are the results:

R6 Graph2
R6 Graph3

Of the 15.4% of posts that were marked with sentiment, 61% came from MicroMedia (i.e., Twitter); however, only 28% of all posts (sentimented and non-sentimented) were from MicroMedia. This is a very positive sign. Automatic sentiment analysis is difficult to do, and the longer a post is (i.e., a blog or forum post), the harder it is to determine the true sentiment of a brand or product’s mention within the post. A post might be positive in the first paragraph and negative in second, or the post may have an overall sentiment but the specific brand/product mention may not be related to that. Therefore, short MicroMedia posts like Tweets are the easiest types of posts for determining accurate sentiment. I am pleased to see that the sentiment analysis in Radian6 is overweighting Twitter sentiment since this should be the most accurate judge of sentiment across the existing Radian6 mediums.

Sentiment Trending Over Time

Given the vagaries of natural language processing as it exists today (and as it’s likely to exist for some time to come), I frequently hear that the primary, useful tool for automatic sentiment analysis is the trending of sentiment over time. The theory is that even if the sentiment marking is inaccurate (even by an order of magnitude), by tracking and trending it over time we can watch the pattern for changes because we are assuming that the level of inaccuracy will be consistent over time.

I am not fully convinced by this argument because I think that the inaccuracy can vary in different directions (toward the positive or negative side) depending on the day/week/month or any time frame one compares. As we saw, one frequently retweeted post dramatically (and incorrectly) skewed our results. However, I am interested to hear all of your opinions on this and whether you think automatic sentiment analysis can be useful in this way.

Until natural language processing improves, I think the only accurate solution is likely a human reader of every post (not just the ones already marked with sentiment). With the explosion in volume of posts on many topics, this is rarely a viable solution simply because of the cost of doing so. I will continue to watch this space, as the results may eventually be groundbreaking, and I will still be placing a heavy bet on Radian6 to be the company that finally pieces together the sentiment puzzle.

Ignite Social Media