Allrighty, it’s time to start digging in. In preparation for our specials campaign at this regional convenience store/gas station chain, we’ve got to figure out what people are buying.
First, customers can either pay at the pump, in which case it’s unlikely that they actually come into the store. Or, to get the discount, they do come into the store. Upgrading the pumps with snazzy new flat panel capabilities is cost and revenue prohibitive – too much $$ to upgrade and too much down time. So, we’re going to target customers as they come into the store with a large flat panel display – maybe 2 or 3 depending on how things go. Also, we might update the company’s website with location specific specials as well.
But, in preparation, we’ve got to figure out what people are buying. We’ve got to look at real purchases and then start to prepare theoretical shopping baskets. How do we do this?
May I Have the Data Please
First we need data – lots of data. Luckily, we have all of the transaction data for a year to look at. What we’re going to do is examine each transaction and start to compute a large matrix. There are about 3,000 products distributed over 20+ stores. We want to know when one item is bought, what is the probability of the customer purchasing another item. And we want to know this by store and region. So we’ve got a few dimensions for the matrix. We might even introduce time of day – for example, it’s far more likely in the morning that someone coming into the store might buy a paper, some coffee, and a breakfast item than they would in the evening. We want to know that so that we can target specific times of the day with specific specials.
Keeping it as Simple as We Can
To make matters a little easier, we’re only going to analyze each product in relation to one other product – if someone buys A then there is an X% probability of buying B. And we’ll store that in a database table that contains the store id, the time interval that we’re interested in, the product pair, and the probability of A => B.
Why?
Because we’re going to use these probabilities (the inverse actually) as the distance function for some cluster analysis later on. The cluster analysis, with a given total probability of purchasing the basket, will target our specials campaign.
Where’s the CEP
Hang in there – first we need to find out where we are. It’s important when turning on the engine for both specials selection and monitoring, that we have the historical data sliced into relevant time intervals – otherwise we’ll end up comparing apples and oranges as we look at expected/actual performance of a special versus the historical averages.
What’s Next?
We’re going to get this big matrix computed, and then we’re going to look at it in Tableau.