Categories
Data Science

Maximum-likelihood Estimation in R

If you are a marketer of a sports team and your mission is to boost the sales of annual membership for home games, what do you do first? You may want to know about which factors you should focus on to encourage customers to renew their annual membership. Actually, you need to wonder what is […]

Categories
Data Science

Principal Component Analysis & Factor Analysis in R

Let’s say, there is a chunk of survey data, which consists of more than fifty questions. Even the number of total respondents reaches 60,000. Maybe it will take you a lot of time to analyze them according to your original intention or analysis objectives. Most people try to classify data or divide them into pieces […]

Categories
Data Science

RFM: Simple & Efficient Way to Focus on Highly-responded Customers

When you try to focus on the target segments with a high response rate, RFM is one of the most useful methods. Most of all, RFM is intuitive and easy to get results in a way that it is a kind of heuristic analytics, which is different from a regression model. RFM is an acronym […]

Categories
Data Science

How to Interpret Texts

If you have data regardless of whether they are obtained from social media or other sources, the next step is to analyze the meaning of those data. However, it is difficult to interpret the natural language in terms of sentiment analysis. Fortunately, there are several ways to understand text data and even provide quantification. TextBlob […]

Categories
Data Science

How to Collect Tweets for Analysis

To analyze how a certain service or product is accepted in a market, many people have tried certain traditional methods such as market survey and FGI. However, it requires expenses and has some limitations of space and time needed to design the research from laying out questionnaire to obtaining survey respondents. There is a simpler […]

Categories
Data Science

[Clustering Analysis in R] #4. Data Analysis

Finally, we can step into the process for clustering analysis, which is to separate customers for their characteristics and to find representative tendency of each group (segment). To that end, I will use two approaches: k-means and hierarchical clustering analysis. The former is to find if independent groups have high similarity from their representative observation […]

Categories
Data Science

[Clustering Analysis in R] #3. Data Diagnostics

Now, we need to diagnose whether these data are adequate for analysis in a way that those results are not originated from biased sample distribution and correlated variables. To that end, muliticollinearity test clarifies correlation between independent variables and I used corrgram() for that matter, which is one of the packages in R. > install.packages(“corrgram”) […]

Categories
Data Science

[Clustering Analysis in R] #2. Data Processing

By and large, there are two types of data: quantitative and qualitative data. If you want to do any kind of analysis such as regression and classification, you need to transform qualitative data to quantitative. Most data that have a category can be transformed to dummy variable. For instance, male can be zero and female […]

Categories
Data Science

[Clustering Analysis in R] #1. Introduction & Data Gathering

  Before Starting the Process The reason why I have started writing these postings is because, as a beginner and learner of data science, I wanted to share my knowledge about clustering analysis and develop them based on active discussions. Main objective of these postings is to understand the whole process from data gathering to […]

Categories
Business

Most Important Thing for Card Issuers during Big Data War

There was an interesting article about big data based businesses of major credit card companies in Korea. According to Korea Economic Daily, two major card companies, Shinhan and Samsung, see big data as a new business opportunity while Hyundai Card is skeptical about it. Recently Shinhan and Samsung are each driving their data based services. […]