Market Basket Analysis: Apriori Algorithm in R, Part 01

At work, I like an idea of unconventional data explorations. Market basket analysis comes to mind – perhaps because it’s simple to explain to your sponsors/superiors, and also due to relative technical ease of implementation. The steps outlined below are for R users and will get you from Excel-like data (i.e. text input copied to clipboard) to transactions and rules.

Arules package in R has apriori algorithm which allows to construct implication rules for itemsets. Suppose you have some kind of itemset and items can show up in various groupings. You’d like to know if there is a any co-occurrence between items or sets of items. For example, if many groupings that contain item A also contain item B, we can take that as an implication rule (association rule). So, suppose your data consists of two columns in Excel: one column contains groupings and the other contains items observed in those groupings. Here is an example:

Groupings Items

A A
A E
A H
A I
A J
A K
A M
A Q
A U
A V
A X
B B
B E
B F
B I
B K
B L
B N
B O
B Q
B R
B V
B Y

So we have two groupings A and B and a number of items. For R’s apriori algorithm to work, it must see the data as a set of “transactions”: each transaction corresponds to a grouping above, each column corresponds to an item from the dataset. Whenever an item is present within a grouping / transaction, we need to specify 1 else 0 (so called binary matrix). In our example, we have:

Grouping A B E F H I J K L M N O Q R U V X Y
A 1 0 1 0 1 1 1 1 0 1 0 0 1 0 1 1 1 0
B 0 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1

Again, columns are items (that’s why there is so many of them in this example), rows are groupings (only two, just as in the data above) and values are counts or binary indicators, e.g. how many of item Y are in grouping B?. Now, the question is, given initial columnar data format, how do we get it to look like the one above? Copy the data in original form to clipboard. On R console type:

d  = read.table (‘clipboard’, header=T); # [1]

t = table (d$Groupings, d$Items);  #  [2]

d = as.data.frame.matrix(t); # [3]

tr = as(as.matrix(d), “transactions”); # [4]

rules = apriori(tr, parameter=list(supp=0.95, conf=0.95, target=”rules”)); # [5]

inspect(rules); # [6] example output: {I, K, V} => {Q}

The above instructions will cross-tabulate the initial input (line [2]), construct the matrix (line [3]), convert it to transactions (an input which apriori package will understand, line [4]) and generate the rules. As you see, default parameters are support (i.e. frequency within input dataset) of 0.95 or higher, confidence of 0.95 or higher and I am not specifying lift parameter. Next time will discuss conceptual background behind association rule mining, in the context or R.

Advertisements

~ by Monsi.Terdex on November 1, 2013.

3 Responses to “Market Basket Analysis: Apriori Algorithm in R, Part 01”

  1. The page background is distracting, and does not look professional. Please change it to a more sober theme. Btw, the article was good. Thanks for posting.

  2. I think the background is great. It made me want to share this with you. http://www.glyphmatic.us. Its a generative art project using unicode.. And too, Thanks for posting this. Its exactly what I was looking for.

  3. As I understand it, the frequent itemsets—those which translate into discovered association rules—include only those occurring in a proportion of the data set’s transactions which exceeds the support parameter. Assuming I am correct about that, it seems you would rarely want to set the support so high as you do in this example, 0.95.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Normal Boy

Nothing out of the ordinary

Data Engineering Blog

Compare different philosophies, approaches and tools for Analytics.

%d bloggers like this: