Category Archives: OLAP

Convenience Store 2.0 – Probabilities

Allrighty, it’s time to start digging in.  In preparation for our specials campaign at this regional convenience store/gas station chain, we’ve got to figure out what people are buying.

First, customers can either pay at the pump, in which case it’s unlikely that they actually come into the store.  Or, to get the discount, they do come into the store.  Upgrading the pumps with snazzy new flat panel capabilities is cost and revenue prohibitive – too much $$ to upgrade and too much down time.  So, we’re going to target customers as they come into the store with a large flat panel display – maybe 2 or 3 depending on how things go.  Also, we might update the company’s website with location specific specials as well.

But, in preparation, we’ve got to figure out what people are buying.  We’ve got to look at real purchases and then start to prepare theoretical shopping baskets.  How do we do this?

May I Have the Data Please

First we need data – lots of data.  Luckily, we have all of the transaction data for a year to look at.  What we’re going to do is examine each transaction and start to compute a large matrix.  There are about 3,000 products distributed over 20+ stores.  We want to know when one item is bought, what is the probability of the customer purchasing another item.  And we want to know this by store and region.  So we’ve got a few dimensions for the matrix.  We might even introduce time of day – for example, it’s far more likely in the morning that someone coming into the store might buy a paper, some coffee, and a breakfast item than they would in the evening.  We want to know that so that we can target specific times of the day with specific specials.

Keeping it as Simple as We Can

To make matters a little easier, we’re only going to analyze each product in relation to one other product – if someone buys A then there is an X% probability of buying B.  And we’ll store that in a database table that contains the store id, the time interval that we’re interested in, the product pair, and the probability of A => B.

Why?

Because we’re going to use these probabilities (the inverse actually) as the distance function for some cluster analysis later on.  The cluster analysis, with a given total probability of purchasing the basket, will target our specials campaign.

Where’s the CEP

Hang in there – first we need to find out where we are.  It’s important when turning on the engine for both specials selection and monitoring, that we have the historical data sliced into relevant time intervals – otherwise we’ll end up comparing apples and oranges as we look at expected/actual performance of a special versus the historical averages.

What’s Next?

We’re going to get this big matrix computed, and then we’re going to look at it in Tableau.

Convenience Store 2.0 – Questions!

Before we get too far down the technology path with our Convenience Store 2.0 project, let’s decide what we want to do.

And before describing what we want to do (increase turnover, but how?), let’s figure out what we’ve got first.

Just Give Me Your Data

This convenience store chain has a network connecting roughly 20 stores in 2 states.  Each transaction is sent back to the ranch to be stored in their database.  Inside of the database, we have tables for:

  • Companies (they own more than one brand)
  • Regions
  • Stores
  • Products
  • Transaction Headers
  • Transaction Details
  • Costs (calculating cost in this environment isn’t as easy as one might think)
  • Prices
  • Employees

This collection of tables roughly represents what I call the ‘Operational Data Store’ or their Online Transaction Processing region.  Missing from this, and what we’ll add in the ‘Dimensional Data Mart’ region, are tables that describe additional dimensions for analysis, like time (this one is important and will become even more important when we bring this whole thing into a real-time analytics mode).

And The Questions Are?

We’ve got a lot of questions, but we’re going to start with something very simple.  But first, our goal again is to increase turnover without cannibalizing and to increase turnover in products with higher margin.  This is important – this simple requirement will end up dictating how much money we can spend on the solution given the expected results (and we’ll model that too).

So, here are some initial and simple questions that we’d like answers to:

  • When someone pays for gas inside (discount), what items do they buy?
  • When someone buys a particular item, what other items might they buy?
  • How does this differ by region, city, state?
  • How does this differ by time of day, day of week, etc?

This will begin to give us some insight into the spending patterns of the customers.  And might lead to some very interesting decisions – like should we have meta-regions based upon buying habits and not just geographic location?  Can we prune items from inventory shifting capital to more productive products or product lines?

Also, this work will begin to lay the foundation for our real time marketing campaigns.  Location based stuff isn’t going to work (unless we implant RFID tags in their customers), so we’ll have to go on a store by store basis.  The end goal (increase turnover in higher margin products) will be supported with a large flat panel screen in the entry way of each store (one or two to test) that will flash specials.  We need to make sure that the specials are working, kill them if they’re not, and let them run longer if they are.

Event Processing Functions According to Opher

Opher wrote a good post on what functions an Event Processing System should have.

Of the four general (and comprehensive) functions, I like #4 (the first three are a given and rather pedestrian):

4. Situation discovery / event pattern discovery: This function is to discover that some situation occurs without having a predefined patterns, using intelligent techniques. While the first three types of functions are more investigated (although I can’t say that all issues are figured out), the fourth one is still a challenge, since there are some experiments, but generally it is not well established yet.

If we look at what functionality the current vendors provide in this area, the most advanced functionality seems to be some type of OLAP capability.  So if we were going to lay down a roadmap of how we get to #4 above, first we’ve got to figure out where we are, what’s required for #4 above, and figure out how to get there from here.

Live OLAP is certainly a good step in the right direction.  Real-time mashups may help as well.  Behind all of this is the internal operational data store – this is where all the transactions for an entity are processed.  It may be called the Online Transaction Processing function within a company.  Adjunct to this might be a large, denormalized Dimensional Data Mart to facilitate traditional OLAP.  Also important is what’s going on in the world outside of the organization – external events certainly can and do influence customers and companies in unique and different ways.  Consider for example, the power that Google wields over shopping decisions today.  If you’re looking for mistletoe belt buckles, there may be a couple hundred vendors who share your same depraved sense of humor but if they’re not on Google, you won’t find them.

It’s in this area of the Dimensional Data Mart that what is referred to as CEP today could provide the foundation for dynamic and event driven OLAP.  I don’t like the terms real-time and OLAP for this, but it’s the best the marketers could come with to describe event driven dimensional data analysis.  So we’ll live with it.

Visualization plays a large role in this – a CEP engine can slice and dice data to your processor’s content, but if you can’t view it, it don’t mean much.  So whatever tool a user decides to install on their desktop must be able to flexibly provide information in a condensed and meaning way; allowing dimensions, filters, slices, etc. to be moved around at will.  This means that those resulting queries need to be coordinated with the CEP engine – to both provide a snapshot or benchmark and event driven incremental updates.

I’m familiar with what a number of firms have done in this area – the most complex and meaningful solution being at Citi.  But, it’s ‘hard-coded.’  Is there any CEP vendor out there today that has abstracted these processes to a level where they might be generally applicable?  I doubt it.  But I bet Aleri’s Live OLAP comes close though.

But no CEP vendor today can claim anything remotely close to the vision that #4 above entails.  I look forward to Opher keeping us all on our toes.