I know I mentioned in the last post that I was planning to keep working on the Chicago data, but I got distracted by a cool bit of code that April Chen sent to me, which pulls in posts from Twitter for text analysis in R.  You can enter a specific search term, or pull in a specific user’s past posts.  It seems to be a bit limited in that you can only receive up to 199 tweets per request, and it also seems to only allow you to go back in time by one week.  I was able to get decent sample sizes of ~1000 tweets or so by requesting a sample from each day of the past week for each search term.

Here are a few word clouds I created which show which other words are commonly used with certain hashtags:


datascience #blessed


I also looked at a few comparison clouds, which show the differences in how often words are used between searches.

Here is #liberal vs. #conservative:


And here is a cloud showing the differences between the tweets from the Progressive Insurance Corporate twitter account and the customers who tweet something @Progressive:


This cloud shows that Flo is a popular term among customers, which at first glance would lead me to believe that customers love Flo.  However the word “bitch” also shows up in the cloud, could these be related?


The chart above displays the top 15 words in the cloud, and the connections between the words illustrate which words are correlated to each other. Indeed, customers think Flo is a bitch 🙁