It's Deja Vu – All Over Again

I was recently asked “What problems does CEP solve that cannot be solved with smart coding, a columnar database or a whopping great grid?” via a Linkedin group for Complex Event Processing.  Here’s the link.  I think I understand the question, and if I do, it’s really the wrong question.  Which means of course that my answer is probably going to cause a bit of a ruckus.

WHAT IS CEP ANYWAY?

There are a couple of things that a system needs to say that it incorporates CEP – the ability to continously query an event stream, pattern matching semantics, and sliding windows over data.  For example:

SELECT PHASORID, AVG(FREQUENCY)  FROM SMARTGRID.WIN:TIME(30 MINUTES) GROUP BY PHASORID;

This sets up a continuous query – it listens for events that are named ‘SMARTGRID’ and calculates the average frequency by phasorid.  This is an example of a time based window.  The window always refers to the last 30 minutes.  Another example:

SELECT A.* FROM PATTERN(A=SMARTGRID(FREQUENCY>130) -> (TIME.INTERVAL=30 MINUTES) AND B=SMARTGRID(FREQUENCY<110 AND A.PHASORID=B.PHASORID));

This statement says, “I want to know when a phasor measures a  high frequency followed by a lower frequency.  This is an example of pattern matching.  Both examples demonstrate a continuous query – those queries just keep running, returning a continuous stream of results.

THAT’S A LOT OF CEP

I’ve been focused primarily on solving problems using CEP for the last 6 years, and using event processing for the last 20+.  So, I’ve seen lots of examples of the right use of CEP and some wrong uses of CEP as well.  We’ll talk about the right uses here.  Mark Piper asks about, “smart coding.”  Let’s answer that one first.

SMART CODERS

Sure, smart coders can get along without CEP.  Just like they can get along without a database.  Or a messaging bus.  And if you’re a really smart coder, you don’t even need an operating system.  You’ll just write all of those using your copious amount of spare time that your employer provides you because you don’t have any deliverables that are time sensitive.  And since you don’t have any customers, you can do whatever you want anyway.  Right?  Wrong!

There’s a difference between an implementation of CEP and CEP itself – maybe you want the whole development environment ala StreamBase, Apama or Sybase.  Maybe you want an API.  That’s most likely up to your individual/team coding style and whether or not you actually buy into the, “Use our platform for everything – it will save you time and money!” argument.

But the gist of this argument is that if you want to have a CEP source of functionality to use when solving problems by writing code, you probably want a general library, or system, that you can use to do so instead of writing it yourself.  And I’ve had the privilege of working with some very smart coders over the years and it took even them slightly longer than a month or two to build something that could be used in a number of different scenarios.

CHEAPER CODE

So that’s the tech side.  What about the business side?  That’s even easier – I don’t want to hire a bunch of C++ coders to write infrastructure – unless that’s my business.  And most businesses don’t fall into that category.  A lot of big financial firms do get a bigger bang for buck by writing their own stuff – they’ve got the staff (because they can afford the really smart coders it takes to build this stuff), they’ve got performance requirements that are extremely stringent and it’s actually cheaper for them to build vs buy given scope and breadth of deployment.  So, unless it’s your business to build infrastructure by using very expensive coders, it’s cheaper and just as effective to use third party libraries or systems.  And that’s just not for CEP but generally applicable across the board.

COLUMNAR DATABASES AND GRIDS?

Not sure if I understand the last part of Mark’s question, but I’ll take a stab anyway.  Mark mentions grid – and the use of grid vs stream processing typically finds itself used in different parts of the same problem.  For example, if I’m running a big credit derivatives trading operation, I’ve probably got a grid running some fairly heavy compute and updating a shared cache of stuff that the bank needs for a variety of applications.  Then let’s imagine that I’m receiving client orders and want to do something based upon data in that grid, I might just use a system that incorporates components of CEP (notice that I did not say a CEP system here – that’s important) to streamline the event stream processing characteristics of that application using the grid cache to grab parameters for me as they change.  So the use of grid and CEP here is complimentary – not dynamically opposed.  Two different compute problems, two different technologies applied to a very common pattern.  There are similar patterns involving the use of CEP for data-in-flight and columnar, or any database really, for historical data.

SO IN CONCLUSION

I recommend the use of existing code over writing new code just because you’ve got smart coders if it makes sense and is economically attractive.  And, just like in the 90′s when we were selling EAI applications, all the C++ guys whined back then, complaining that, “We can do it better and faster.”  While that may have been true (very few ever actually proved that to me), it was usually far more expensive and very brittle.  Tools exist for a reason – it’s what separates us from the animals.

AND AS ALWAYS

Thanks for reading!

DarkStar Filters Twitter Stream in Real Time

I show how to use DarkStar to filter the Twitter stream.

Analytical Platforms, Databases, & CEP

This from the book, “Hadoop – the Definitive Guide,”

“This, in a nutshell, is what Hadoop provides: a reliable shared storage and analysis system.”

BUT I THOUGHT IT WAS ABOUT BIG DATA?

It is, but Hadoop is not designed, at least today, for anything other than write once (and now maybe append), and then analyze many times over and over again.  Databases are traditionally better suited for applications that need to both read, write, and update data.  That is until databases like columnar arrived on the scene.  Thank you Michael Stonebraker, et. al.

STONEBRAKER, AGAIN?

Yup, a couple of years ago, Mr. Stonebraker said that Map/Reduce was  a major step backwards.  That’s when he was actively involved with Vertica.  Just recently, he said that data warehouses are too big for in-memory approaches.  It would appear that his opinions have evolved along with technology.  Hmm.  Can you have it both ways?  I think not. So?

AGAIN, WHAT’S YOUR POINT?

My point is that analytical platforms are different than databases, and different than data warehouses.  And if we look at databases today, we can make a broad distinction – those databases with in-db map/reduce and those without.  Aster Data, Greenplum, and Sybase (planned) on the commercial side fit that bill as do Riak and MongoDB on the free-for-now front.  Analytical platforms need to analyze big data, the faster the better, with deep & complex analytical capabilities, and that lends itself to an in-db map/reduce ‘brother can you par a dime’ (paradigm).

THIS JUST IN FROM THE PROSPECTIVE CUSTOMER (what we call, “The Market”)

Everyone that I’ve been speaking with in the analytical platform market doesn’t just want to store their data and analyze it later, they want to analyze it *now* and store it and analyze it later.  And then again.  Lather.  Rinse.  Repeat.  But unless you’re a vendor like Sybase who has CEP, or Oracle who has CEP, or Vertica who has Streambase, how are you going to offer your customer those capabilities?  (and yes, I’ve mixed traditional database offerings just now with event processing add-ons in a mixed metaphor worthy of Bush himself)

CEP JUST ISN’T THAT MOTIVATING ANYMORE

During all the hubbub during SAP’s acquisition of Sybase, there was nary a mention of CEP.  No, our darling CEP platforms had taken a backseat to the mobile love child that had been in the making for years.

BUT IT’S STILL A TIC MARK

Which means that if you’ve got a database, especially an analytical database, then you’re going to want to combine the loading ability with some type of event processing functionality.  This is just common sense. Or your customer might go buy from someone else.

THE DRUMROLL PLEASE

Seems like a lot of software companies are buying hardware companies.  And hardware companies are buying software companies.  Like HP, GreenPlum, EMC, 3PAR, etc.  There aren’t too many analytical platforms out there.  There are less columnar guys.  But there’s only one pure play CEP vendor left, and that’s StreamBase.  So the question isn’t who’s going to buy which analytical platform or columnar datastore, it’s who’s going to by StreamBase.  And my bet is HP.

AND AS ALWAYS

Thanks for reading.

Event Processing in the Cloud – DataSift is a Big Proof Point

In the past year or so, I’ve heard from many skeptics – people who didn’t believe that Event Processing could be successfully deployed in the cloud.  Granted, most of these folks represented firms actively engaged in providing the High Frequency Trading (Algo Trading) industry with tools.  And in that arena, cloud deployment probably doesn’t make sense.  Yet.

CLOSER TO HOME THOUGH

Ask people in Capital Markets about Twitter and the most common response you’ll get is, “What do people use it for?”  This is because most of the people in Capital Markets can’t use things like Twitter, instant messaging, or Facebook at work and if they can, it’s heavily regulated.  But the point is, that they mostly don’t get it – I myself was included in this camp until a friend of mine explained it to me.  Since then, I’ve taken to Twitter like a fish to water.  My point here is that there are a lot more people in the world who know something about Twitter than High Frequency Trading.

SO WHAT?

I’ll tell you.  This past week we saw the announcement of DataSift; there’s a great video of DataSift in operation here.   I’m really impressed with Data’Sift’s capabilities – and although I don’t think their filtering capabilities incorporate CEP as some have claimed (I didn’t see pattern matching or windows in the demo) and I strongly disagree with Nick Halstead‘s claims that their offering is the only one out there with these capabilities, I think DataSift proves a very interesting point.

AND THAT IS?

Twitter is probably the closest thing that we have that embodies the event cloud (via a single source anyway).  Ask 20 people what Twitter is and you’ll get 20 different answers.  My answer would be, “a living, breathing, consciousness – what’s the world thinking about and doing right now?” And in that regard, Twitter is an “event cloud.”  And DataSift is querying and filtering that “event cloud” in real time – providing relevance, or extracting items of interest.  To you, right now – not tomorrow in a report on your desk or in a daily email digest.  Right now.  And they’re using an event driven architecture to do it.

THE CONCLUSION

DataSift is processing the entire Twitter fire hose and although statistics are hard to come by in terms of what that means,  Mr. Halstead is readying the platform for release at web scale.  So , that’s big data in, and ‘sifted’ data out, to potentially millions of users and all simultaneously.  I think those who claimed, “The cloud is too slow for event processing,” might be eating some crow soon.

THE SINGULARITY IS APPROACHING

And you either get it, or you don’t.  And if you do, the question isn’t whether or not you’ll use tools like DataSift, but how soon.

AND AS ALWAYS

Thanks for reading.

Data Mining in Streaming Data – CEP & SAX

In the last couple of posts, I’ve outlined a method for both reducing the dimensionality of continuous data and also reducing it to symbols to make further analysis easier. The method we’ve been using is referred to as Symbolic Aggregate Approximation, or SAX.

STREAMING SAX

The examples that I’ve shown so far have been illustrated using Excel. But if we were serious about using SAX in a real world scenario, we’d most probably be processing some type of streaming data. SAX has application anywhere there’s a bunch of highly dimensional, continuous data being generated. But we’ll stick to stock market trade data for now.

I went out and purchased a month’s worth of IBM trade prices & volumes from the NYSE – it’s very easy to do, and you can do it here. Once I did that, I loaded the data into a MySQL database and prepared to process it within DarkStar, our distributed event processing system that uses components of streaming map/reduce and complex event processing.

BEFORE WE GET STARTED

In the examples I outlined earlier, I took an entire day’s worth of data, normalized it, and then applied piecewise aggregate approximation to it, dividing a trading day up into 7 roughly equal samples. Now that we’re going to process the data as it streams out of the exchange, how should we break things up? The answer depends upon the question you’re asking. If there’s a pattern you think shows itself every 10 minutes and consists of 10 discrete values, then we should sample 10 minutes worth of data and break it down into intervals of 1 minute using the techniques shown earlier. So, the first thing we’re going to do is create a named window. A named window is going to provide the data we need in a 10 minute, sliding window.  We describe the window like this:

CREATE WINDOW winTradeData.win:time(10 minutes) as select * from tradeEvent;
INSERT INTO winTradeData select * from tradeEvent;

What these two statements do is to, 1) create a sliding window that contains the last 10 minutes of tradeEvent events, and 2) inserts tradeEvent events into that window as they arrive.  The first statement creates a named window that has all of the fields from the tradeEvent event.  The second statement populates the window.  So far so good.

WE’VE GOT A WINDOW FULL OF EVENTS, NOW WHAT?

Well, we’d like to break down the window into 10 equal segments of 1 minute each.  And then we’ll average and classify the 1 minutes segments.  But before we can classify the data, we need to normalize it.  We want to do this every minute; we don’t want to wait and do this every 10 minutes do we?  If we did, we might miss a whole bunch of patterns that started in the previous window and ended in the current window.  So we’re going to pull data from the window and normalize it every minute with this statement (I call this a ‘tumbling window’):

SELECT symbol, (price-avg(price))/std(price) as normalized_price FROM winTradeData output every 1 minute;

PICK A LETTER, ANY LETTER

We’re going to apply PAA to this resulting data set, (see earlier post), PAA will give us an average value for each time slice within the interval that we’re analyzing.  In this case, it’s 1 minute long.  So we want to average all the trades for a 1 minute period and then look up the corresponding SAX letter.  We could write another query to accomplish this or perhaps modify the one above.  Once we have the averages, we can assign a letter and then we’ll have a SAX word.

SO WE’VE GOT A SAX WORD, NOW WHAT?

Now that we’re able to describe streaming data in a discrete way, with a lower bounding function, we’re ready to do some more things.  From an earlier post, I said that SAX could be used for clustering, classification, anomaly detection, and search.  We’re going to focus on search in the next post.

UNTIL THEN

Think about how this algorithm lends itself to a map/reduce (via Hadoop or via in database map/reduce) implementation and how we’d use SAX then to correlate streaming data to historical data – there’s a lot in this blog that talk about this, perhaps not in terms of SAX, but there’s work in map/reduce, inverted indexes, etc.  We’ll need all of that, and a little more, to string it all together.

THE NEXT POST

Will happen sooner than the last, I promise.

THANKS FOR READING!


Normalizing Streaming Data & Piecewise Aggregate Approximation

Ok, so you’ve read the last post, downloaded and read the papers on SAX, and you’re ready to get going!  Wonderful.  First, you’ll need some data which I’ve thoughtfully included for download here- SAX Prep (an excel file with some trades in it).  Download the data, and then follow along below.

WHAT ARE WE DOING?

What we want to do is take a whole bunch of numeric data and reduce the dimension of it and then convert it into some type of symbolic representation.  This is so we can do some other interesting things with it later that are much easier when the data is represented this way.  Currently, the data in the Excel spreadsheet that I’ve toiled for hours on just for you, looks like this:

What we see in this chart, is a day’s worth of trade prices for a make believe symbol.  Actually, I know the symbol, but I can’t tell you that because you didn’t buy the data!

In the next step, we want to normalize the data with a mean of 0 and standard deviation of 1.  So, compute the average for the day, and then for each price, subtract the average and divide by the standard deviation.  Or just use some Excel functions; which I have done in the spreadsheet for you.

Piecewise Aggregate Approximation (PAA)

Once we’ve normalized the data, we can apply PAA,.  I picked time divisions of an hour, and averaged the normalized price information.  You can see the normalized price data and resulting buckets, as as computed via PAA in the chart at the right.  There’s something important to notice here, although I didn’t pick a bunch of divisions, which might have given more specificity to the resulting PAA analysis, you can see that the shape of the PAA looks like the underlying data.  This is important when we then use

With Applied PAA

symbols to describe the patterns – because we’re using PAA underneath, we can calculate the distance between observed SAX patterns.  Also, you can see some of the statistically irrelevant spikes have been ignored.  Super Good!

EVERYTHING’S A SYMBOL, MAN…

So, how do we go from normalized PAA to symbol?  Easy; if you look in the spreadsheet, you’ll see the values -1.28, -.84, -.52, -.25, 0, .25, .52, .84, and 1.28.  And I’ve associated letters with those #’s.  So, the first PAA is 1.68, which is greater than 1.28, so our word begins with I.

MAY I HAVE THE ENVELOPE PLEASE

So after all of this analysis, our SAX word that represents a whole lot of trade data is, “IFGDBAB.”  How cools is that?  A whole day’s worth of data expressed as a few symbols.  Think of how much easier it would be to look up a nearest neighbor to this pattern, or maybe classify it given some cluster analysis, or detect something that we haven’t seen before using suffix trees?  All much easier to do with symbolic vs continuous numeric data.

TAKE ME TO THE ‘B’ SECTION

If you read the papers I recommended, and have paid attention, you might notice a potential problem with the methodology outlined and applied here so far.  What is it?  Also, this has been a lot of fun to do using Excel, but I think we could actually get this done easier and faster using some good old sliding windows and aggregation (CEP).

AND AS ALWAYS

Thanks for reading – I’ll be showing how to do this using DarkStar next.  Because chances are if we’re doing this in real time, we’re doing it for *a bunch* of data, and Excel, although wonderful, just ain’t going to cut it.

Data Mining in Streaming Data

Lately, I’ve been working on some interesting projects involving not just the usual suspects of stream processing, but data mining within high velocity time series.  In conjunction with that effort, I’ve been doing a lot of research in the areas of symbolic representation, dimension reduction, clustering, indexing, classification, and anomaly detection.  A prolific  researcher in this area is Dr. Eamonn Keogh – I’ll be applying some of his team’s ideas so some interesting customer problems and telling you all about it here.  Let’s get started!

TOO MUCH DATA

In dealing with real time streaming numerical data, there is just too much of it sometimes to do anything meaningful with it in real time.  For example, in pattern recognition, trying to compute nearest neighbors using continuous, highly dimensional data is a compute nightmare.  Or, once you’ve identified a pattern of interest, finding similar patterns either in historical data or in streaming data is extremely compute intensive, and until recently, outside the scope of streaming engines.  This is because if you need to go outside of main memory, even if you’re distributed like we are, say, “Hello!” to my friend, Latency!

NUMERICAL TECHNIQUES

There are several numerical techniques one can employ to summarize streaming numerical data.  The problem with these representations is that they are all continuous, or real valued.  Another large problem, according to Dr. Keogh, is that none of the popular techniques allows a distance measure that lower bounds a distance measure found in the underlying data.  This means that once you’ve conflated your data, any analysis on that representation might not be accurate, or representative of the underlying data stream.  Also, because the resulting values are not discrete, we can’t use algorithms like hashing or search, Well, that’s no good!  So what to do?

HOT SAX – GETTING DOWN TO THE GIST

Symbolic Aggregate approXimation (SAX) allows data to be conflated, discretized, and distance to be calculated between observations.  That means we can use all of the wholesome goodness out there in the areas of clustering, indexing (search), classification, and anomaly detection while also dramatically reducing the amount of data we need to crunch.  Getting us closer to integrating streaming events and historical data.  Nirvana.  SAX is the result of much work done and still being done by Dr. Keogh and his team at University of California – Riverside and lots of information about that work can be found here.

CALL THE PREP CHEFS

First, we need to do some prep work, and I recommend reading the papers – they’re informative and there’s really not too much math either.  As a precursor to SAX encoding, we’ve got some work to do.  We’ll use Piecewise Aggregate Approximation as in intermediate step and before applying PAA, we’ll normalize the data.  In my next post, we’ll show some spiffy charts and graphs as we implement SAX within DarkStar (our distributed event processing system that incorporates streaming map/reduce & CEP functionality).  Go read the papers and then come back for some fun.

THANKS FOR READING!

Why I Love the Cloud Today – Up & Running

How many times have you thought to yourself, “Self, I’d really like to take a look at that wonderful, does everything that I need, server-based product” only to realize that you don’t have a machine, and if you did have a machine, you don’t have the OS because the product likes to run on RHEL and you only have Ubuntu laying around.  Sure, you could download an ISO, burn a CD, find a piece of hardware that has enough memory and disk on it (oddly enough, all of those machines in a development shop seem to be occupied..) and get going.

BUT WAIT, OPERATORS ARE STANDING BY

I ran into exactly this situation today.  I was cruising the ‘net – looking at analytical databases in conjunction with a project I’m working on and came across Greenplum’s offering.  I’d heard a lot about it, but I thought, “Oh, there will be endless meetings with pull-the-string sales guys spouting pre-recorded messages wanting to know why I wanted to use their database before they’d let me look at their software; and only after filling out a number of legal forms.  All of which will require legal review.”  But no, Greenplum had a Single Node Engine available for download.

EUREKA!

The website maintained my interest, something that’s harder and harder to do these days, and abracadabra, I received an email with links to download their database.  I excitedly clicked through, only to find that the database ran on an OS that I didn’t have handy.

PRACTICING WHAT YOU PREACH

I spend all day either talking, tweeting, or writing about elastic resource.  And when I’m not doing that, I’m probably writing code.  And then it came to me, in a flash – “Hey stupid, maybe Rackspace has a Centos5.5 image ready and willing?  Well, RackSpace did – and I was up and running with Greenplum’s database humming along contentedly, waiting to do my bidding.

SURE, BUT HOW LONG DID THAT TAKE?

About an hour.  Seriously.  Oh, and I spent about a dollar.  Really.

SO IF YOU DON’T THINK THE CLOUD CHANGES EVERYTHING

You’ve either been living under a rock for the last 2 years, or like me, spend all day talking about it and forget that this stuff actually works!

AND AS USUAL

Thanks for reading! I appreciate your time.

Why I Love the Cloud Today – Easy Peasy FIX in the Cloud

We’re working with a customer who’d like to send us information using the FIX protocol.  FIX is used in electronic trading for sending orders and receiving executions from brokers, ecn’s, and exchanges.

DARKSTAR SPEAKS FIX

DarkStar, our cloud based, distributed event processing engine that incorporates streaming map/reduce and complex event processing, speaks FIX.  We use the QuickFIX open source FIX engine.  You can find it here.  It’s free.  We include this as a standard OnRamp (OnRamps are used to inject information into DarkStar) and we don’t charge for it.  We’re the only CEP vendor that includes FIX support for free.

DEPLOYMENT (GENTLEMEN, START YOUR STOP WATCHES..)

We have a standard OnRamp image.  One simply logs into our cloud, and deploys another virtual machine using the OnRamp image and our customer gets a dedicated VM to handle their FIX connection to their DarkStar cluster.  The OnRamp image already knows how to inject events into DarkStar, so we set a couple of configuration settings for the FIX engine, and we’re ready to start testing.  Really, that’s it.  The client’s FIX messages (events) are now ready for dynamic, CEP based query and streaming map/reduce style analysis.  Total time?  Less than 5 minutes.

CLOUD ISN’T NECESSARILY SaaS

If your SaaS doesn’t leverage elastic resource (like just spooling up a VM and instant-presto-change-o it’s available to do work), then it’s not really cloud based.  So while your can certainly make applications available via the cloud, taking the necessary steps to utilize elastic resource can have a fantastic ROI.  Like I pointed out above – a new FIX connection, running on a dedicated VM in less than 5 minutes.

AND AS ALWAYS

Thanks for reading!

Why CEP in the Cloud Makes Sense

CEP isn’t really about low latency.  The ability to do things quickly is important, just as in any system – especially those systems that grow and need to handle a lot of information.  Doing things quickly means doing things efficiently.  And doing things efficiently means less money spent on hardware.  Theoretically anyway.

SO WHAT IS REALLY COOL ABOUT CEP?

CEP gives one the to submit queries like “select symbol, avg(shares) from trade_stream group by symbol over 5 minutes emit every 1 minute.”  The CEP engine would consume this query, and then start returning an average of shares per trade for each symbol over the last 5 minutes, and it would then update that every 1 minute.  Granted, this is a very simple query, but the point here is that the queries are continuous.  That means that they’re submitted to the CEP server and they run until they’re told not to run any more.  So as the CEP engine continues to consume events, the queries keep running and producing results.

I ONLY WANT TO SEE WHAT I’M INTERESTED IN

So, if you were interested in various things, like when the sentiment regarding a certain theme hit a certain level in Twitter or a certain theme hit a certain level in Twitter and a related stock either increased or decreased in price and volume, you could submit those queries to the CEP server and get results back when those conditions occurred.  CEP engines also typically provide pattern matching capabilities; like if B happens within 5 minutes of A happening, let me know.

RESOURCES AND MEMORY

If you’re querying a lot of data, or your time windows are large, you may need a lot of memory and a lot of CPU.  Let’s paint a scenario where you’re looking at real time sales from a lot of different stores.  And you’d like to slice and dice that information by many dimensions, and do it real time with CEP based continuous queries.  Great – that’s a perfect use case for CEP.  But depending upon how much data you’ve got and how much compute is required to roll everything up for analysis and subsequent drill down, and how many users you’ve got running these queries, you might just run out of cpu or memory.

WAITER? ANOTHER ROUND OF FRESH RESOURCES PLEASE

My definition of ‘cloud’ includes elastic resource.  That means when you need more storage, compute, etc. you ask for it and it arrives, almost magically.  And then using that new resource, you can expand your ability to perform some set of tasks.  As in the above paragraph, we might add more compute if we added more users, more high velocity big data, or more and more complex queries.  Adding additional virtual machines in the cloud is a perfect way to address this.

SO WHAT’S THE PROBLEM?

Well, CEP engines aren’t designed that way.  For the most part anyway.  If you want this kind of ability, you’ve basically got to assemble all of this yourself – using a variety of vendors and products.  Basic questions like, “How do end users enter queries?  ”How are users notified when the things that they’re interested in occur?” typically involve multiple products from multiple vendors and very expensive professional services; either from the vendor or a 3rd party.  And here’s something else to consider – vendors selling software licenses don’t really want to build your system.  Complex accounting rules don’t let software vendors realize license revenue until the project is complete and you’ve accepted the solution.  Also, just because someone knows how to build a CEP engine doesn’t really mean they know how to build the kind of system we’ve described above; 100′s (maybe even 1,000′s) of users, dynamic queries flying all over the place, easy to use GUI’s, or know anything about how to set all of this up to use elastic resources.  What happens if you go out and buy all of this hardware to support your solution and it flops?  Well, you’re out the hardware costs then, aren’t you?

WHY AM I WRITING THIS?

In the near future, you’re going to start seeing a new style of deploying CEP based applications.  CEP based applications incorporating streaming map/reduce functionality and RIA based graphical front ends.  And these applications will allow hundreds of users to analyze high velocity streaming big data.  And do it all very, very quickly.  And do it in the cloud. All of the things that most CEP vendors would tell you is just simply not possible.  Except this vendor.

AS ALWAYS

Thanks for reading!