Category Archives: Dimensional Data Mart

Convenience Store 2.0 – Probabilities

Allrighty, it’s time to start digging in.  In preparation for our specials campaign at this regional convenience store/gas station chain, we’ve got to figure out what people are buying.

First, customers can either pay at the pump, in which case it’s unlikely that they actually come into the store.  Or, to get the discount, they do come into the store.  Upgrading the pumps with snazzy new flat panel capabilities is cost and revenue prohibitive – too much $$ to upgrade and too much down time.  So, we’re going to target customers as they come into the store with a large flat panel display – maybe 2 or 3 depending on how things go.  Also, we might update the company’s website with location specific specials as well.

But, in preparation, we’ve got to figure out what people are buying.  We’ve got to look at real purchases and then start to prepare theoretical shopping baskets.  How do we do this?

May I Have the Data Please

First we need data – lots of data.  Luckily, we have all of the transaction data for a year to look at.  What we’re going to do is examine each transaction and start to compute a large matrix.  There are about 3,000 products distributed over 20+ stores.  We want to know when one item is bought, what is the probability of the customer purchasing another item.  And we want to know this by store and region.  So we’ve got a few dimensions for the matrix.  We might even introduce time of day – for example, it’s far more likely in the morning that someone coming into the store might buy a paper, some coffee, and a breakfast item than they would in the evening.  We want to know that so that we can target specific times of the day with specific specials.

Keeping it as Simple as We Can

To make matters a little easier, we’re only going to analyze each product in relation to one other product – if someone buys A then there is an X% probability of buying B.  And we’ll store that in a database table that contains the store id, the time interval that we’re interested in, the product pair, and the probability of A => B.

Why?

Because we’re going to use these probabilities (the inverse actually) as the distance function for some cluster analysis later on.  The cluster analysis, with a given total probability of purchasing the basket, will target our specials campaign.

Where’s the CEP

Hang in there – first we need to find out where we are.  It’s important when turning on the engine for both specials selection and monitoring, that we have the historical data sliced into relevant time intervals – otherwise we’ll end up comparing apples and oranges as we look at expected/actual performance of a special versus the historical averages.

What’s Next?

We’re going to get this big matrix computed, and then we’re going to look at it in Tableau.

Convenience Store 2.0 – Questions!

Before we get too far down the technology path with our Convenience Store 2.0 project, let’s decide what we want to do.

And before describing what we want to do (increase turnover, but how?), let’s figure out what we’ve got first.

Just Give Me Your Data

This convenience store chain has a network connecting roughly 20 stores in 2 states.  Each transaction is sent back to the ranch to be stored in their database.  Inside of the database, we have tables for:

  • Companies (they own more than one brand)
  • Regions
  • Stores
  • Products
  • Transaction Headers
  • Transaction Details
  • Costs (calculating cost in this environment isn’t as easy as one might think)
  • Prices
  • Employees

This collection of tables roughly represents what I call the ‘Operational Data Store’ or their Online Transaction Processing region.  Missing from this, and what we’ll add in the ‘Dimensional Data Mart’ region, are tables that describe additional dimensions for analysis, like time (this one is important and will become even more important when we bring this whole thing into a real-time analytics mode).

And The Questions Are?

We’ve got a lot of questions, but we’re going to start with something very simple.  But first, our goal again is to increase turnover without cannibalizing and to increase turnover in products with higher margin.  This is important – this simple requirement will end up dictating how much money we can spend on the solution given the expected results (and we’ll model that too).

So, here are some initial and simple questions that we’d like answers to:

  • When someone pays for gas inside (discount), what items do they buy?
  • When someone buys a particular item, what other items might they buy?
  • How does this differ by region, city, state?
  • How does this differ by time of day, day of week, etc?

This will begin to give us some insight into the spending patterns of the customers.  And might lead to some very interesting decisions – like should we have meta-regions based upon buying habits and not just geographic location?  Can we prune items from inventory shifting capital to more productive products or product lines?

Also, this work will begin to lay the foundation for our real time marketing campaigns.  Location based stuff isn’t going to work (unless we implant RFID tags in their customers), so we’ll have to go on a store by store basis.  The end goal (increase turnover in higher margin products) will be supported with a large flat panel screen in the entry way of each store (one or two to test) that will flash specials.  We need to make sure that the specials are working, kill them if they’re not, and let them run longer if they are.

Convenience Store 2.0 – Dimensional Data Mart & Tableau

We’re going to build a dimensional data mart for Convenience Store 2.0. To keep things really simple, I will use Access 2003 as the data store (accessible via ODBC) and Tableau for analysis.  If you’re going to follow along during this series, you might consider downloading a trial of Tableau and getting up to speed.  I’ll be making the Access database available as well soon.