Quantcast
Channel: I Tech, Therefore I Am » Big Data
Viewing all articles
Browse latest Browse all 7

Self-Reflection: A Quick Twitter Sentiment Analysis

$
0
0

I took a look back at 4 years worth of tweets in my last post. Some have asked how I found the theme of praise vs thanks, so I wrote it down.

Your Turn

Here are the steps I took (from an OS X system):

At this point, I just looked around.

Load the index.html file left in the tweets/ directory and take a few minutes to review your time online. I noticed a lot of thank you’s from day one, so I dug deeper.

To get a comparison of times of praise versus times of thanks, I compared two values:

# egrep -i "thanks|thank you" dnd_tip_tweets.csv | wc -l

Followed by:
# egrep -i "gladly|my pleasure|np|sure thing|you're welcome" dnd_tip_tweets.csv | wc -l

To break this down a little:

  • egrep -i for an case-insensitive search for lines with these phrases in them from the csv
  • The pipe to wc -l to get the number of unique lines in the csv that included these words

I found the number of times I’ve offered thanks were an order of magnitude higher than the number of times I’ve used language that accepts praise (1120 vs 360). It stood out to me and gave me enough data points to feel confident. To make sure your difference is enough to be valid, you can use this site to verify statistical significance. My example is below.

Using isvalid.org to show significance.
Using isvalid.org to show significance.

A side note on word choices: these are phrases I use frequently (‘sure thing’). To be most useful, you should search for your own patterns.

 


Viewing all articles
Browse latest Browse all 7

Latest Images

Trending Articles





Latest Images