Written by: Nicole Watson and Henry Naish
Disclaimer: This blog post solely reflects the opinion of the author and should not be taken to represent the general views of IPPR’s management team or those of fellow authors.
Can a computer be taught to understand human emotions? Proponents of sentiment analysis would argue that they most certainly can. Put simply, sentiment analysis is the use of algorithms to extract emotional meaning from a large number of texts. From its roots in market research, sentiment analysis is emerging as a promising technique in fields ranging from social science to stock market analysis. It allows us to process the emotions expressed in vast amounts of text more quickly than a human ever could. That is, if it really works. To put sentiment analysis to the test, we investigated how the results of a sentiment analysis algorithm measured up to human judgements when ranking news articles about the US economy.
For this project, we used a dataset provided by the platform Crowdflower, consisting of thousands of articles about the US economy. Human contributors ranked each article on a scale of 1-9, 1 being the most negative and 9 being the most positive.
Before we look at some findings, we should remind ourselves of an important point when using computers to do our reading for us. When it comes to text, humans are always right. Words have meaning for us, not computers, and these techniques are a shortcut to understanding a large volume of documents that we might not have time to read properly ourselves.
We found that even when instructed to rank articles on a scale of 1 to 9, humans still tend to see sentiment in binary terms: optimistic and pessimistic. The red plot shows human responses: the two peaks cluster around moderately pessimistic and optimistic sentiment. Humans don’t like extremes – we’re always thinking the next article could be even better or even worse than the last.
By contrast, our algorithm clusters articles around a moderately pessimistic level. Working off a ‘sentiment lexicon’ – a dictionary mapping words to scores based on how positive or negative they are – it scores each word and aggregates the results to determine how positive the article is overall. It produces a distribution of articles around the most frequent level of optimism. In other words, computers think in distributions while we think about how an article makes us feel.
When comparing how the algorithm and humans scored each article, we found that for positive articles, the computer was more often wrong than right. The graph below shows the accuracy of the computer’s scores: blue for agreement with human judgment and red for disagreement. Whilst the blue streak down the middle shows some agreement between human and machine, the red in the top left indicates that the algorithm struggles to pick up positivity. Clearly, humans communicate optimism in ways that can’t be captured through analysis of single words.
Whilst the power of sentiment analysis cannot be dismissed on the basis of our simplistic model, it’s clear that there’s a long way to go before our computers are doing our reading for us. By focusing solely on the use of specific words, the computer misses nuance that humans understand effortlessly. Although computers will always surpass us when it comes to reading speed, what is this worth when they just don’t get it?