Category Archives: EDA

Convenience Store 2.0 – Probabilities

Allrighty, it’s time to start digging in.  In preparation for our specials campaign at this regional convenience store/gas station chain, we’ve got to figure out what people are buying.

First, customers can either pay at the pump, in which case it’s unlikely that they actually come into the store.  Or, to get the discount, they do come into the store.  Upgrading the pumps with snazzy new flat panel capabilities is cost and revenue prohibitive – too much $$ to upgrade and too much down time.  So, we’re going to target customers as they come into the store with a large flat panel display – maybe 2 or 3 depending on how things go.  Also, we might update the company’s website with location specific specials as well.

But, in preparation, we’ve got to figure out what people are buying.  We’ve got to look at real purchases and then start to prepare theoretical shopping baskets.  How do we do this?

May I Have the Data Please

First we need data – lots of data.  Luckily, we have all of the transaction data for a year to look at.  What we’re going to do is examine each transaction and start to compute a large matrix.  There are about 3,000 products distributed over 20+ stores.  We want to know when one item is bought, what is the probability of the customer purchasing another item.  And we want to know this by store and region.  So we’ve got a few dimensions for the matrix.  We might even introduce time of day – for example, it’s far more likely in the morning that someone coming into the store might buy a paper, some coffee, and a breakfast item than they would in the evening.  We want to know that so that we can target specific times of the day with specific specials.

Keeping it as Simple as We Can

To make matters a little easier, we’re only going to analyze each product in relation to one other product – if someone buys A then there is an X% probability of buying B.  And we’ll store that in a database table that contains the store id, the time interval that we’re interested in, the product pair, and the probability of A => B.

Why?

Because we’re going to use these probabilities (the inverse actually) as the distance function for some cluster analysis later on.  The cluster analysis, with a given total probability of purchasing the basket, will target our specials campaign.

Where’s the CEP

Hang in there – first we need to find out where we are.  It’s important when turning on the engine for both specials selection and monitoring, that we have the historical data sliced into relevant time intervals – otherwise we’ll end up comparing apples and oranges as we look at expected/actual performance of a special versus the historical averages.

What’s Next?

We’re going to get this big matrix computed, and then we’re going to look at it in Tableau.

Convenience Store 2.0

Almost everyone everywhere has visited a convenience store.  For gas.  For a beverage.  Or maybe a pack of smokes.  Ever wonder what makes that extremely quick visit possible?  The answer is an Event Driven Architecture.  Let’s take a look.

Events Everywhere

You pull into a convenience store up to a gas pump (event).  You insert your credit card (event).  The credit card is authorized (multiple events).  You start pumping gas (event stream as total gas pumped increases).  And when you return the nozzle to the pump, more events are generated.  All of these events are displayed, in real-time on one of the registers inside of the store.  The clerk can intervene by authorizing the purchase, resetting the pump, cutting you off, even turn off all of the pumps.  This is the simple case and it happens every time every customer pulls in to get gas.

More Events

Let’s assume that the convenience store has wisely decided to offer a 5 cent/gallon discount for paying inside of the store.  This incentive is designed to get you into the store – where you might just buy some more stuff.  Higher margin stuff.  Convenience stores are all about turnover – how many times can the store earn a return on every dollar invested per year.  The idea here is that you buy gas and the store makes 1 cent a gallon but when you come inside, they’re making a much higher percentage margin on your purchases.  So, you’ve got gas on the pump when you come inside and decide to purchase a few additional items.  When you approach the till, the clerk must choose which pump you used (correlation), add that to your balance, and proceed to enter your basket into the till.  This is usually done with a scanner – ah ha!  More events.  As each item is scanned, the price is looked up and the description and price is registered.  If the item is on special, the special is displayed on the monitor.  Inventory is updated (events).  When limits are reached based upon predictive algorithms, orders are automatically placed for additional inventory (analytics).  And now you pay, with credit card or cash, even more events.

The User Community

To make matters even more interesting, all of this functionality must be presented and able to be manipulated by minimum wage staff.  And you thought your users were difficult.

Who’s Watching All of These Events

Well, the manager, and the district manager, and even headquarters can tap into these event streams willy nilly.  Aggregates are computed for period to date, last day, last hour (sliding windows).  Dashboards are updated with KPI’s.  And the rolling profit is calculated (people like this one).  Individual stores can be monitored just as easily as stores in a city, region, state, etc. in addition to watching a particular clerk – both for performance and fraud.

24×7

To add yet another wrinkle into the equation, all of these systems have to be up and running 24×7.  Ever gone into a convenience store when the registers are down?  Or maybe the credit card system isn’t working?  Chances are you turned around and walked right back out.  This means that these systems are revenue critical – just like the Tier 1 apps in any bank in the world.

Improvement?

So how do we make life even better (shortening the amount of time you need to be in the store for your purchase – remember, TURNOVER is key here), how can we target specials based upon the individual in the store?  How can we shift purchases to higher margin items without cannabalizing lower margin, but necessary items to keep in stock?  I will be highlighting methods for exactly this in coming posts.

The Message

The message is clear and the message is simple.  Event Driven Architectures are all around us and have been for quite some time.  The question is also simple – how can we leverage an EDA for competitive advantage as opposed to cost of entry (imagine a gas/convenience store on the turnpike that didn’t have the systems described above in place).  In the above real-world example, we see event streams, sliding windows, aggregates, correlation, statistics, triggers, analytics, dashboarding, fraud detection, and more.  And not a single mention of algorithmic trading!

Get On the Bus

I like this quote from a blog post that Marco made recently,

Lately I have been reading Event Processing : Designing IT Systems for Agile Companies by Mani Chandy and Roy Schulte. They present a good list of principles an event driven application should follow in order to actually implement EDA:

  • Report current events. Things that happen right now. Not (old) historical data
  • Pushes event - A producer creates and event and publishes it. Not the other way around. In EDA, a system never queries for event. It receives them.
  • Process events immediately – Don’t store events and query them later, that’s a task better left for databases
  • Know nothing of event destinations – Just fire away events and assume someone else will route them to proper destinations.
  • Events are not commands – You don’t know anything about event destinations, per above bullet, so you can’t command them to do anything.

So if you look at a CEP system, which should naturally fit into an EDA. Then the CEP system should obviously follow the principles above. (advertisement: ruleCore CEP Server does, it was built to be a component in an EDA)

I agree with Marco.  The one principle above, “Know nothing of event destinations,” seems to imply that some sort of a bus or distributed cache is being used by the Event Processing Server.  The only CEP vendors that I’m aware of that have their own bus/cache are Oracle, Tibco and Microsoft.  At Kaskad, we OEM’d Tibco’s high speed bus and used that extensively in the production deployment at the Boston Stock Exchange which involved a number of multiprocessor Linux and Solaris boxes.

By using a bus/cache, one is able to easily add additional functionality to the functionality set within an EDA.  The new processes are simply put on the bus/cache, consume events/messages and then do something with them.  By not using a bus/cache, hard wired connections must be made between the different functional components of the system – this increases the ‘brittleness’ of the system and if done synchronously, increases latency and could potentially cause SLA violations.