Category Archives: ESP

Event Processing Functions According to Opher

Opher wrote a good post on what functions an Event Processing System should have.

Of the four general (and comprehensive) functions, I like #4 (the first three are a given and rather pedestrian):

4. Situation discovery / event pattern discovery: This function is to discover that some situation occurs without having a predefined patterns, using intelligent techniques. While the first three types of functions are more investigated (although I can’t say that all issues are figured out), the fourth one is still a challenge, since there are some experiments, but generally it is not well established yet.

If we look at what functionality the current vendors provide in this area, the most advanced functionality seems to be some type of OLAP capability.  So if we were going to lay down a roadmap of how we get to #4 above, first we’ve got to figure out where we are, what’s required for #4 above, and figure out how to get there from here.

Live OLAP is certainly a good step in the right direction.  Real-time mashups may help as well.  Behind all of this is the internal operational data store – this is where all the transactions for an entity are processed.  It may be called the Online Transaction Processing function within a company.  Adjunct to this might be a large, denormalized Dimensional Data Mart to facilitate traditional OLAP.  Also important is what’s going on in the world outside of the organization – external events certainly can and do influence customers and companies in unique and different ways.  Consider for example, the power that Google wields over shopping decisions today.  If you’re looking for mistletoe belt buckles, there may be a couple hundred vendors who share your same depraved sense of humor but if they’re not on Google, you won’t find them.

It’s in this area of the Dimensional Data Mart that what is referred to as CEP today could provide the foundation for dynamic and event driven OLAP.  I don’t like the terms real-time and OLAP for this, but it’s the best the marketers could come with to describe event driven dimensional data analysis.  So we’ll live with it.

Visualization plays a large role in this – a CEP engine can slice and dice data to your processor’s content, but if you can’t view it, it don’t mean much.  So whatever tool a user decides to install on their desktop must be able to flexibly provide information in a condensed and meaning way; allowing dimensions, filters, slices, etc. to be moved around at will.  This means that those resulting queries need to be coordinated with the CEP engine – to both provide a snapshot or benchmark and event driven incremental updates.

I’m familiar with what a number of firms have done in this area – the most complex and meaningful solution being at Citi.  But, it’s ‘hard-coded.’  Is there any CEP vendor out there today that has abstracted these processes to a level where they might be generally applicable?  I doubt it.  But I bet Aleri’s Live OLAP comes close though.

But no CEP vendor today can claim anything remotely close to the vision that #4 above entails.  I look forward to Opher keeping us all on our toes.

Get On the Bus

I like this quote from a blog post that Marco made recently,

Lately I have been reading Event Processing : Designing IT Systems for Agile Companies by Mani Chandy and Roy Schulte. They present a good list of principles an event driven application should follow in order to actually implement EDA:

  • Report current events. Things that happen right now. Not (old) historical data
  • Pushes event - A producer creates and event and publishes it. Not the other way around. In EDA, a system never queries for event. It receives them.
  • Process events immediately – Don’t store events and query them later, that’s a task better left for databases
  • Know nothing of event destinations – Just fire away events and assume someone else will route them to proper destinations.
  • Events are not commands – You don’t know anything about event destinations, per above bullet, so you can’t command them to do anything.

So if you look at a CEP system, which should naturally fit into an EDA. Then the CEP system should obviously follow the principles above. (advertisement: ruleCore CEP Server does, it was built to be a component in an EDA)

I agree with Marco.  The one principle above, “Know nothing of event destinations,” seems to imply that some sort of a bus or distributed cache is being used by the Event Processing Server.  The only CEP vendors that I’m aware of that have their own bus/cache are Oracle, Tibco and Microsoft.  At Kaskad, we OEM’d Tibco’s high speed bus and used that extensively in the production deployment at the Boston Stock Exchange which involved a number of multiprocessor Linux and Solaris boxes.

By using a bus/cache, one is able to easily add additional functionality to the functionality set within an EDA.  The new processes are simply put on the bus/cache, consume events/messages and then do something with them.  By not using a bus/cache, hard wired connections must be made between the different functional components of the system – this increases the ‘brittleness’ of the system and if done synchronously, increases latency and could potentially cause SLA violations.

Here's Johnny!

Welcome Back

Seems like not a lot has changed in the last year in the world of Complex Event Processing – and I agree with a lot of what Tim Bass has to say in this post.

Pigs & Lipstick

One of the most glaring inadequacies of the current crop of offerings in this space is the requirement that applications built using these graphical programming environments still require knowledge of and configuration for a production environment, i.e, number of machines, number of processors, etc.  This means that deployment of applications to a cluster of machines requires that the app be aware of, and must be configured for inter-machine communication.  Same thing for number of processors in a machine.  So while the graphical environments used for programming in the current products continues to improve, and the number of external system adapters continues to grow, it’s the same old back-end underneath.  A back-end that does nothing to address the current direction of technology – lots and lots of multiprocessor machines in a cluster – or even cloud deployment for that matter.

This is NOT a Surprise

Given the time frame that most of the CEP products started to come to market meant that clusters and multiprocessor based solutions were really just coming of age – so it’s no surprise that the vendors didn’t contemplate solutions based upon that emerging reality.  Even Kaskad’s solution, although dynamically able to take advantage of multiple processors in a machine, could suffer from process starvation if a particular set of threads got caught in a deadlock or hung up waiting on some blocking operation.

It’s the Stream, Stupid

So in agreeing with Tim Bass’s post (from above), I’d like to highlight why algorithmic trading is a good niche for the current players.  Given the time and order requirements (orders and market data must be processed in order) and the natural solution of partitioning applications over a number of machines that don’t require the ability to communicate with other machines transparently in the cluster (so it’s not really a cluster, is it?), the current product set’s inability to abstract both inter and intra machine communication transparently isn’t really a hindrance.  In the algorithmic trading space (which doesn’t really require CEP as defined by David Luckham; just really fast stream processing) the ability to utilize a visual programming language (because algorithmic trading requires constant tweaking) and resultant minimal thread execution model works just fine.