How to Collect Tweets for Analysis

To analyze how a certain service or product is accepted in a market, many people have tried certain traditional methods such as market survey and FGI. However, it requires expenses and has some limitations of space and time needed to design the research from laying out questionnaire to obtaining survey respondents. There is a simpler way to conduct this kind of analysis by using Python and R.

This analysis consists of two parts: gathering required data for analysis and analyzing sentiment and preferences based on the data.

1. Overview

Prior to starting the process, it is required to clarify what I want to know and how to get the right information to make a decision. The primary objective of this work is to quantify how positively or negatively customers think about the product and service. To that end, we need to obtain data from social media without expenses and any prejudice, which can be driven by the coordinator or other participants of the survey.

Twitter is relatively more preferred to Facebook in a way that data can be gathered more easily by using python libraries. Especially, Tweepy enables analyzers to gather relevant tweets based on Twitter’s open API.

2. Creating an app

First and foremost, you have to create an app to gather tweets at the development center of Twitter. Log into Twitter (dev.twitter.com) and click “manage your apps” under the “Tools” at the bottom the page.

Click on the “Create New App” and fill in the blank as you can see from the following example. You can put non-working address into the website unless you need to connect the app to your public website or blog.

After clicking the app that you made and go to the “Keys and Access Tokens” tab (or automatically reload to the page), you will see the button of “Create my access token” under the “Token actions” at the bottom of the page. Click on the button and you can see a warning message while you wait for completion of the authorization.

Browse information by clicking tabs of the app.

3. Tweepy streaming

Install Tweepy by typing pip install tweepy. Go to the Tweepy GitHub site. You will find “streaming.py” in examples folder on the site.

You do not need to know a lot of things about python code since you can use most codes in the file. Copy and paste codes of the file and rename it to your preference. The important thing, that you have to do, is to put access tokens in the app of Twitter into the following four blanks in streaming.py.

consumer_key="UF12XXXXXXXX"
consumer_secret="    "
access_token="     "
access_token_secret="    "

4. Filtering keywords

You can see the final line of code, which is a keyword that you want to filter for data. Default keyword is “basketball” and you can change it according to the objective of your analysis. If you want to know several keywords at the same time to check rivalry or competition, you replace the original line with the following lines.

stream = Stream(auth, l)
stream.filter(track = keywords)
keywords = ["Samsung", "Apple"]

If you run the python file, you can gather the information. This is really awesome. However, you may need to sort out data in a more improved way by using JSON (http://en.wikipedia.org/wiki/JSON). You can check some ways how to save gathering data in a text file based on JSON. I may handle it later.

Leave a Reply Cancel reply