Author Archives: colin

Analytical Platforms, Databases, & CEP

This from the book, “Hadoop – the Definitive Guide,”

“This, in a nutshell, is what Hadoop provides: a reliable shared storage and analysis system.”

BUT I THOUGHT IT WAS ABOUT BIG DATA?

It is, but Hadoop is not designed, at least today, for anything other than write once (and now maybe append), and then analyze many times over and over again.  Databases are traditionally better suited for applications that need to both read, write, and update data.  That is until databases like columnar arrived on the scene.  Thank you Michael Stonebraker, et. al.

STONEBRAKER, AGAIN?

Yup, a couple of years ago, Mr. Stonebraker said that Map/Reduce was  a major step backwards.  That’s when he was actively involved with Vertica.  Just recently, he said that data warehouses are too big for in-memory approaches.  It would appear that his opinions have evolved along with technology.  Hmm.  Can you have it both ways?  I think not. So?

AGAIN, WHAT’S YOUR POINT?

My point is that analytical platforms are different than databases, and different than data warehouses.  And if we look at databases today, we can make a broad distinction – those databases with in-db map/reduce and those without.  Aster Data, Greenplum, and Sybase (planned) on the commercial side fit that bill as do Riak and MongoDB on the free-for-now front.  Analytical platforms need to analyze big data, the faster the better, with deep & complex analytical capabilities, and that lends itself to an in-db map/reduce ‘brother can you par a dime’ (paradigm).

THIS JUST IN FROM THE PROSPECTIVE CUSTOMER (what we call, “The Market”)

Everyone that I’ve been speaking with in the analytical platform market doesn’t just want to store their data and analyze it later, they want to analyze it *now* and store it and analyze it later.  And then again.  Lather.  Rinse.  Repeat.  But unless you’re a vendor like Sybase who has CEP, or Oracle who has CEP, or Vertica who has Streambase, how are you going to offer your customer those capabilities?  (and yes, I’ve mixed traditional database offerings just now with event processing add-ons in a mixed metaphor worthy of Bush himself)

CEP JUST ISN’T THAT MOTIVATING ANYMORE

During all the hubbub during SAP’s acquisition of Sybase, there was nary a mention of CEP.  No, our darling CEP platforms had taken a backseat to the mobile love child that had been in the making for years.

BUT IT’S STILL A TIC MARK

Which means that if you’ve got a database, especially an analytical database, then you’re going to want to combine the loading ability with some type of event processing functionality.  This is just common sense. Or your customer might go buy from someone else.

THE DRUMROLL PLEASE

Seems like a lot of software companies are buying hardware companies.  And hardware companies are buying software companies.  Like HP, GreenPlum, EMC, 3PAR, etc.  There aren’t too many analytical platforms out there.  There are less columnar guys.  But there’s only one pure play CEP vendor left, and that’s StreamBase.  So the question isn’t who’s going to buy which analytical platform or columnar datastore, it’s who’s going to by StreamBase.  And my bet is HP.

AND AS ALWAYS

Thanks for reading.

Event Processing in the Cloud – DataSift is a Big Proof Point

In the past year or so, I’ve heard from many skeptics – people who didn’t believe that Event Processing could be successfully deployed in the cloud.  Granted, most of these folks represented firms actively engaged in providing the High Frequency Trading (Algo Trading) industry with tools.  And in that arena, cloud deployment probably doesn’t make sense.  Yet.

CLOSER TO HOME THOUGH

Ask people in Capital Markets about Twitter and the most common response you’ll get is, “What do people use it for?”  This is because most of the people in Capital Markets can’t use things like Twitter, instant messaging, or Facebook at work and if they can, it’s heavily regulated.  But the point is, that they mostly don’t get it – I myself was included in this camp until a friend of mine explained it to me.  Since then, I’ve taken to Twitter like a fish to water.  My point here is that there are a lot more people in the world who know something about Twitter than High Frequency Trading.

SO WHAT?

I’ll tell you.  This past week we saw the announcement of DataSift; there’s a great video of DataSift in operation here.   I’m really impressed with Data’Sift’s capabilities – and although I don’t think their filtering capabilities incorporate CEP as some have claimed (I didn’t see pattern matching or windows in the demo) and I strongly disagree with Nick Halstead‘s claims that their offering is the only one out there with these capabilities, I think DataSift proves a very interesting point.

AND THAT IS?

Twitter is probably the closest thing that we have that embodies the event cloud (via a single source anyway).  Ask 20 people what Twitter is and you’ll get 20 different answers.  My answer would be, “a living, breathing, consciousness – what’s the world thinking about and doing right now?” And in that regard, Twitter is an “event cloud.”  And DataSift is querying and filtering that “event cloud” in real time – providing relevance, or extracting items of interest.  To you, right now – not tomorrow in a report on your desk or in a daily email digest.  Right now.  And they’re using an event driven architecture to do it.

THE CONCLUSION

DataSift is processing the entire Twitter fire hose and although statistics are hard to come by in terms of what that means,  Mr. Halstead is readying the platform for release at web scale.  So , that’s big data in, and ‘sifted’ data out, to potentially millions of users and all simultaneously.  I think those who claimed, “The cloud is too slow for event processing,” might be eating some crow soon.

THE SINGULARITY IS APPROACHING

And you either get it, or you don’t.  And if you do, the question isn’t whether or not you’ll use tools like DataSift, but how soon.

AND AS ALWAYS

Thanks for reading.

Data Mining in Streaming Data – CEP & SAX

In the last couple of posts, I’ve outlined a method for both reducing the dimensionality of continuous data and also reducing it to symbols to make further analysis easier. The method we’ve been using is referred to as Symbolic Aggregate Approximation, or SAX.

STREAMING SAX

The examples that I’ve shown so far have been illustrated using Excel. But if we were serious about using SAX in a real world scenario, we’d most probably be processing some type of streaming data. SAX has application anywhere there’s a bunch of highly dimensional, continuous data being generated. But we’ll stick to stock market trade data for now.

I went out and purchased a month’s worth of IBM trade prices & volumes from the NYSE – it’s very easy to do, and you can do it here. Once I did that, I loaded the data into a MySQL database and prepared to process it within DarkStar, our distributed event processing system that uses components of streaming map/reduce and complex event processing.

BEFORE WE GET STARTED

In the examples I outlined earlier, I took an entire day’s worth of data, normalized it, and then applied piecewise aggregate approximation to it, dividing a trading day up into 7 roughly equal samples. Now that we’re going to process the data as it streams out of the exchange, how should we break things up? The answer depends upon the question you’re asking. If there’s a pattern you think shows itself every 10 minutes and consists of 10 discrete values, then we should sample 10 minutes worth of data and break it down into intervals of 1 minute using the techniques shown earlier. So, the first thing we’re going to do is create a named window. A named window is going to provide the data we need in a 10 minute, sliding window.  We describe the window like this:

CREATE WINDOW winTradeData.win:time(10 minutes) as select * from tradeEvent;
INSERT INTO winTradeData select * from tradeEvent;

What these two statements do is to, 1) create a sliding window that contains the last 10 minutes of tradeEvent events, and 2) inserts tradeEvent events into that window as they arrive.  The first statement creates a named window that has all of the fields from the tradeEvent event.  The second statement populates the window.  So far so good.

WE’VE GOT A WINDOW FULL OF EVENTS, NOW WHAT?

Well, we’d like to break down the window into 10 equal segments of 1 minute each.  And then we’ll average and classify the 1 minutes segments.  But before we can classify the data, we need to normalize it.  We want to do this every minute; we don’t want to wait and do this every 10 minutes do we?  If we did, we might miss a whole bunch of patterns that started in the previous window and ended in the current window.  So we’re going to pull data from the window and normalize it every minute with this statement (I call this a ‘tumbling window’):

SELECT symbol, (price-avg(price))/std(price) as normalized_price FROM winTradeData output every 1 minute;

PICK A LETTER, ANY LETTER

We’re going to apply PAA to this resulting data set, (see earlier post), PAA will give us an average value for each time slice within the interval that we’re analyzing.  In this case, it’s 1 minute long.  So we want to average all the trades for a 1 minute period and then look up the corresponding SAX letter.  We could write another query to accomplish this or perhaps modify the one above.  Once we have the averages, we can assign a letter and then we’ll have a SAX word.

SO WE’VE GOT A SAX WORD, NOW WHAT?

Now that we’re able to describe streaming data in a discrete way, with a lower bounding function, we’re ready to do some more things.  From an earlier post, I said that SAX could be used for clustering, classification, anomaly detection, and search.  We’re going to focus on search in the next post.

UNTIL THEN

Think about how this algorithm lends itself to a map/reduce (via Hadoop or via in database map/reduce) implementation and how we’d use SAX then to correlate streaming data to historical data – there’s a lot in this blog that talk about this, perhaps not in terms of SAX, but there’s work in map/reduce, inverted indexes, etc.  We’ll need all of that, and a little more, to string it all together.

THE NEXT POST

Will happen sooner than the last, I promise.

THANKS FOR READING!


Normalizing Streaming Data & Piecewise Aggregate Approximation

Ok, so you’ve read the last post, downloaded and read the papers on SAX, and you’re ready to get going!  Wonderful.  First, you’ll need some data which I’ve thoughtfully included for download here- SAX Prep (an excel file with some trades in it).  Download the data, and then follow along below.

WHAT ARE WE DOING?

What we want to do is take a whole bunch of numeric data and reduce the dimension of it and then convert it into some type of symbolic representation.  This is so we can do some other interesting things with it later that are much easier when the data is represented this way.  Currently, the data in the Excel spreadsheet that I’ve toiled for hours on just for you, looks like this:

What we see in this chart, is a day’s worth of trade prices for a make believe symbol.  Actually, I know the symbol, but I can’t tell you that because you didn’t buy the data!

In the next step, we want to normalize the data with a mean of 0 and standard deviation of 1.  So, compute the average for the day, and then for each price, subtract the average and divide by the standard deviation.  Or just use some Excel functions; which I have done in the spreadsheet for you.

Piecewise Aggregate Approximation (PAA)

Once we’ve normalized the data, we can apply PAA,.  I picked time divisions of an hour, and averaged the normalized price information.  You can see the normalized price data and resulting buckets, as as computed via PAA in the chart at the right.  There’s something important to notice here, although I didn’t pick a bunch of divisions, which might have given more specificity to the resulting PAA analysis, you can see that the shape of the PAA looks like the underlying data.  This is important when we then use

With Applied PAA

symbols to describe the patterns – because we’re using PAA underneath, we can calculate the distance between observed SAX patterns.  Also, you can see some of the statistically irrelevant spikes have been ignored.  Super Good!

EVERYTHING’S A SYMBOL, MAN…

So, how do we go from normalized PAA to symbol?  Easy; if you look in the spreadsheet, you’ll see the values -1.28, -.84, -.52, -.25, 0, .25, .52, .84, and 1.28.  And I’ve associated letters with those #’s.  So, the first PAA is 1.68, which is greater than 1.28, so our word begins with I.

MAY I HAVE THE ENVELOPE PLEASE

So after all of this analysis, our SAX word that represents a whole lot of trade data is, “IFGDBAB.”  How cools is that?  A whole day’s worth of data expressed as a few symbols.  Think of how much easier it would be to look up a nearest neighbor to this pattern, or maybe classify it given some cluster analysis, or detect something that we haven’t seen before using suffix trees?  All much easier to do with symbolic vs continuous numeric data.

TAKE ME TO THE ‘B’ SECTION

If you read the papers I recommended, and have paid attention, you might notice a potential problem with the methodology outlined and applied here so far.  What is it?  Also, this has been a lot of fun to do using Excel, but I think we could actually get this done easier and faster using some good old sliding windows and aggregation (CEP).

AND AS ALWAYS

Thanks for reading – I’ll be showing how to do this using DarkStar next.  Because chances are if we’re doing this in real time, we’re doing it for *a bunch* of data, and Excel, although wonderful, just ain’t going to cut it.

Data Mining in Streaming Data

Lately, I’ve been working on some interesting projects involving not just the usual suspects of stream processing, but data mining within high velocity time series.  In conjunction with that effort, I’ve been doing a lot of research in the areas of symbolic representation, dimension reduction, clustering, indexing, classification, and anomaly detection.  A prolific  researcher in this area is Dr. Eamonn Keogh – I’ll be applying some of his team’s ideas so some interesting customer problems and telling you all about it here.  Let’s get started!

TOO MUCH DATA

In dealing with real time streaming numerical data, there is just too much of it sometimes to do anything meaningful with it in real time.  For example, in pattern recognition, trying to compute nearest neighbors using continuous, highly dimensional data is a compute nightmare.  Or, once you’ve identified a pattern of interest, finding similar patterns either in historical data or in streaming data is extremely compute intensive, and until recently, outside the scope of streaming engines.  This is because if you need to go outside of main memory, even if you’re distributed like we are, say, “Hello!” to my friend, Latency!

NUMERICAL TECHNIQUES

There are several numerical techniques one can employ to summarize streaming numerical data.  The problem with these representations is that they are all continuous, or real valued.  Another large problem, according to Dr. Keogh, is that none of the popular techniques allows a distance measure that lower bounds a distance measure found in the underlying data.  This means that once you’ve conflated your data, any analysis on that representation might not be accurate, or representative of the underlying data stream.  Also, because the resulting values are not discrete, we can’t use algorithms like hashing or search, Well, that’s no good!  So what to do?

HOT SAX – GETTING DOWN TO THE GIST

Symbolic Aggregate approXimation (SAX) allows data to be conflated, discretized, and distance to be calculated between observations.  That means we can use all of the wholesome goodness out there in the areas of clustering, indexing (search), classification, and anomaly detection while also dramatically reducing the amount of data we need to crunch.  Getting us closer to integrating streaming events and historical data.  Nirvana.  SAX is the result of much work done and still being done by Dr. Keogh and his team at University of California – Riverside and lots of information about that work can be found here.

CALL THE PREP CHEFS

First, we need to do some prep work, and I recommend reading the papers – they’re informative and there’s really not too much math either.  As a precursor to SAX encoding, we’ve got some work to do.  We’ll use Piecewise Aggregate Approximation as in intermediate step and before applying PAA, we’ll normalize the data.  In my next post, we’ll show some spiffy charts and graphs as we implement SAX within DarkStar (our distributed event processing system that incorporates streaming map/reduce & CEP functionality).  Go read the papers and then come back for some fun.

THANKS FOR READING!

Why I Love the Cloud Today – Up & Running

How many times have you thought to yourself, “Self, I’d really like to take a look at that wonderful, does everything that I need, server-based product” only to realize that you don’t have a machine, and if you did have a machine, you don’t have the OS because the product likes to run on RHEL and you only have Ubuntu laying around.  Sure, you could download an ISO, burn a CD, find a piece of hardware that has enough memory and disk on it (oddly enough, all of those machines in a development shop seem to be occupied..) and get going.

BUT WAIT, OPERATORS ARE STANDING BY

I ran into exactly this situation today.  I was cruising the ‘net – looking at analytical databases in conjunction with a project I’m working on and came across Greenplum’s offering.  I’d heard a lot about it, but I thought, “Oh, there will be endless meetings with pull-the-string sales guys spouting pre-recorded messages wanting to know why I wanted to use their database before they’d let me look at their software; and only after filling out a number of legal forms.  All of which will require legal review.”  But no, Greenplum had a Single Node Engine available for download.

EUREKA!

The website maintained my interest, something that’s harder and harder to do these days, and abracadabra, I received an email with links to download their database.  I excitedly clicked through, only to find that the database ran on an OS that I didn’t have handy.

PRACTICING WHAT YOU PREACH

I spend all day either talking, tweeting, or writing about elastic resource.  And when I’m not doing that, I’m probably writing code.  And then it came to me, in a flash – “Hey stupid, maybe Rackspace has a Centos5.5 image ready and willing?  Well, RackSpace did – and I was up and running with Greenplum’s database humming along contentedly, waiting to do my bidding.

SURE, BUT HOW LONG DID THAT TAKE?

About an hour.  Seriously.  Oh, and I spent about a dollar.  Really.

SO IF YOU DON’T THINK THE CLOUD CHANGES EVERYTHING

You’ve either been living under a rock for the last 2 years, or like me, spend all day talking about it and forget that this stuff actually works!

AND AS USUAL

Thanks for reading! I appreciate your time.

Why I Love the Cloud Today – Easy Peasy FIX in the Cloud

We’re working with a customer who’d like to send us information using the FIX protocol.  FIX is used in electronic trading for sending orders and receiving executions from brokers, ecn’s, and exchanges.

DARKSTAR SPEAKS FIX

DarkStar, our cloud based, distributed event processing engine that incorporates streaming map/reduce and complex event processing, speaks FIX.  We use the QuickFIX open source FIX engine.  You can find it here.  It’s free.  We include this as a standard OnRamp (OnRamps are used to inject information into DarkStar) and we don’t charge for it.  We’re the only CEP vendor that includes FIX support for free.

DEPLOYMENT (GENTLEMEN, START YOUR STOP WATCHES..)

We have a standard OnRamp image.  One simply logs into our cloud, and deploys another virtual machine using the OnRamp image and our customer gets a dedicated VM to handle their FIX connection to their DarkStar cluster.  The OnRamp image already knows how to inject events into DarkStar, so we set a couple of configuration settings for the FIX engine, and we’re ready to start testing.  Really, that’s it.  The client’s FIX messages (events) are now ready for dynamic, CEP based query and streaming map/reduce style analysis.  Total time?  Less than 5 minutes.

CLOUD ISN’T NECESSARILY SaaS

If your SaaS doesn’t leverage elastic resource (like just spooling up a VM and instant-presto-change-o it’s available to do work), then it’s not really cloud based.  So while your can certainly make applications available via the cloud, taking the necessary steps to utilize elastic resource can have a fantastic ROI.  Like I pointed out above – a new FIX connection, running on a dedicated VM in less than 5 minutes.

AND AS ALWAYS

Thanks for reading!

Why CEP in the Cloud Makes Sense

CEP isn’t really about low latency.  The ability to do things quickly is important, just as in any system – especially those systems that grow and need to handle a lot of information.  Doing things quickly means doing things efficiently.  And doing things efficiently means less money spent on hardware.  Theoretically anyway.

SO WHAT IS REALLY COOL ABOUT CEP?

CEP gives one the to submit queries like “select symbol, avg(shares) from trade_stream group by symbol over 5 minutes emit every 1 minute.”  The CEP engine would consume this query, and then start returning an average of shares per trade for each symbol over the last 5 minutes, and it would then update that every 1 minute.  Granted, this is a very simple query, but the point here is that the queries are continuous.  That means that they’re submitted to the CEP server and they run until they’re told not to run any more.  So as the CEP engine continues to consume events, the queries keep running and producing results.

I ONLY WANT TO SEE WHAT I’M INTERESTED IN

So, if you were interested in various things, like when the sentiment regarding a certain theme hit a certain level in Twitter or a certain theme hit a certain level in Twitter and a related stock either increased or decreased in price and volume, you could submit those queries to the CEP server and get results back when those conditions occurred.  CEP engines also typically provide pattern matching capabilities; like if B happens within 5 minutes of A happening, let me know.

RESOURCES AND MEMORY

If you’re querying a lot of data, or your time windows are large, you may need a lot of memory and a lot of CPU.  Let’s paint a scenario where you’re looking at real time sales from a lot of different stores.  And you’d like to slice and dice that information by many dimensions, and do it real time with CEP based continuous queries.  Great – that’s a perfect use case for CEP.  But depending upon how much data you’ve got and how much compute is required to roll everything up for analysis and subsequent drill down, and how many users you’ve got running these queries, you might just run out of cpu or memory.

WAITER? ANOTHER ROUND OF FRESH RESOURCES PLEASE

My definition of ‘cloud’ includes elastic resource.  That means when you need more storage, compute, etc. you ask for it and it arrives, almost magically.  And then using that new resource, you can expand your ability to perform some set of tasks.  As in the above paragraph, we might add more compute if we added more users, more high velocity big data, or more and more complex queries.  Adding additional virtual machines in the cloud is a perfect way to address this.

SO WHAT’S THE PROBLEM?

Well, CEP engines aren’t designed that way.  For the most part anyway.  If you want this kind of ability, you’ve basically got to assemble all of this yourself – using a variety of vendors and products.  Basic questions like, “How do end users enter queries?  ”How are users notified when the things that they’re interested in occur?” typically involve multiple products from multiple vendors and very expensive professional services; either from the vendor or a 3rd party.  And here’s something else to consider – vendors selling software licenses don’t really want to build your system.  Complex accounting rules don’t let software vendors realize license revenue until the project is complete and you’ve accepted the solution.  Also, just because someone knows how to build a CEP engine doesn’t really mean they know how to build the kind of system we’ve described above; 100′s (maybe even 1,000′s) of users, dynamic queries flying all over the place, easy to use GUI’s, or know anything about how to set all of this up to use elastic resources.  What happens if you go out and buy all of this hardware to support your solution and it flops?  Well, you’re out the hardware costs then, aren’t you?

WHY AM I WRITING THIS?

In the near future, you’re going to start seeing a new style of deploying CEP based applications.  CEP based applications incorporating streaming map/reduce functionality and RIA based graphical front ends.  And these applications will allow hundreds of users to analyze high velocity streaming big data.  And do it all very, very quickly.  And do it in the cloud. All of the things that most CEP vendors would tell you is just simply not possible.  Except this vendor.

AS ALWAYS

Thanks for reading!

Standalone CEP is Dead-Long Live the Database

In days of old, when CEP didn’t exist, and we called it ESP, or Event Stream Processing, the whole value proposition that most vendors in the space espoused was, “We don’t have to write stuff to the database to process it.  And that makes us really fast!”

THE VALUE PROPOSITION PLEASE

What made me start thinking about this was all the stirring lately in the in-memory database area.  Hasso Plattner’s (SAP’s chairman) been working on this for quite some time and last week announced some fairly startling news.  SAP is planning on using both row/column, in-memory store for everything.  Because the underlying database is so fast, a lot of stuff that had to be pre-calculated previously, doesn’t have to be any more.  And that can cut down a database size by an order of magnitude.  Small enough to fit in memory. How?  Let me explain.

OTP & OLAP

A company’s OTP (online transaction processing) environment is where the money is made.  It’s typically a row-based, transaction oriented, ACID compliant store.  Vendors like Oracle, Sybase, & Microsoft dominate here with a growing segment of PostGreSQL and MySQL use.  A business needs speed and they need transactions here – you want to know if someone has bought something or not and it needs to happen quickly so that customers don’t get upset and go somewhere else.  The data models are normalized, key-based, and sparse.

When a company wants to analyze sales, or costs, or whatever, they typically extract all the data from their OTP environment, transform and load it into their data warehouse and may update their OLAP environment at the same time.  Updating the OLAP environment involves taking all of the transaction data from the OTP environment and exploding it into huge fact tables with corresponding dimension tables.  And lots and lots of pre-aggregated results. This is all so that end user OLAP tools can spin the data to provide analysts with way to analyze all the OTP data.  Questions like “let me see sales by product by region by quarter,” or planning questions like “what happens if we raise salaries by 5%?”  This creates a lot of data.  And all that data takes more space.  All because companies don’t want to mess with the OTP environment – and because disk is slow.

PHYSICAL CONSTRAINTS

Most of the database technology so-called innovations in the relational world exist because disk is slow.  Normalized tables, indexes, etc. all exist so that data can be moved around as fast as possible because disk is slow.  And when relational databases were invented, disk was really slow.  But if an in-memory database was fast enough to not only handle the needs of an OTP environment, but also produce exploded fact tables and compute dimensional analysis on the fly, we don’t need all that extra disk space.  Sure, we still need transactions because we need to know when something happened.  But that can be done with memory now too.

ON A CLEAR DISK, YOU CAN SEEK FOREVER

But if in-memory databases are that fast now, what happens to CEP’s value proposition?  Every meaningful CEP based system I’ve worked on in the last 5 years has involved some form of persistence somewhere.  Seems like you can’t really separate the two.  So why not combine them if in-memory databases are fast enough now to support this type of behavior?

MPP DATABASES & MAP/REDUCE

CEP is fine for solving some problems, but typically not those problems involving either a lot of data or a lot of compute.  Most CEP systems involve taking a high velocity data stream, decorating it with some fairly simple calculations, and then doing something when a trigger fires.  A perfect example of this is algo, or High Frequency Trading.  We haven’t seen CEP in sophisticated derivative or fixed income environments because of the compute required – that doesn’t fit the CEP model of yester-year.

Using massively parallel processing databases & map/reduce solves a very big problem in this area – data affinity.  In a grid or cloud based compute model, it’s easy to saturate the network getting data from where it lives to where it needs to be processed.  If a compute process is broken down using a map/reduce foundation, and the compute is run where the data lives, and results are then bubbled up, not only does the compute get done faster, but there’s less of a chance of saturating the network as well.

SO WHY NOT EMBED CEP IN YOUR MPP IN-MEMORY DATABASE?

That seems like a good idea to me.  Maybe the resulting latency isn’t low enough for things like algo trading, but then most applications don’t require that kind of latency.  On the plus side, you get to take advantage of virtual resource, cloud based things like compute, storage, and network.  That way, you can dynamically add more compute when you need to; as your business grows or to accommodate spikes.  And my bet is that the latency problem isn’t that far from being solved; even for algo trading applications.

MY PREDICTION

So if you’re CEP solution doesn’t also have a good persistence story, you’re toast.  And if your database solution doesn’t have a good CEP story, you’re toast.  I know vendors in both spaces who are not being considered for opportunities because they’re missing one of those critical components.  Vendors like Oracle, Sybase, and now SAP agree with me.  And customers do too.  Customer’s are always right, right?

AND AS ALWAYS

Thanks for reading.

Let's Blame Everything on CEP & HFT

In a recent blog post, Tim Bass blames CEP for much of the world’s problems.  You can read his post here.

You can read my response here:

Tim,

Lot’s of misinformation and knee jerk reactions are out there regarding HFT. Unfortunately, this is another one of them. Your posts are usually a bit inflammatory; that’s why I like reading your blog. But you usually provide at least a shred or two of evidence to support your thesis.

I’d really like to see some examples of how HFT is damaging individual investors – I’m personally not aware of any and I’ve spent the last 20+ years in the field – even before CEP became a popular buzz word. If you’ve got some examples, I’d love to see them.

We went though this same controversy back in 1987 and many people blamed that crash on portfolio insurance and automated trading then – but they were wrong then, as they are now in blaming the recent flash crash on HFT. When there are no buyers, prices plunge. When there are no buyers, HFT stops – HFT firms aren’t usually sending market orders blindly in a falling market. It’s just stupid to do so.

The recent flash crash has to do more with market structure and too much global debt – NYSE halted (temporarily, I know they don’t like that term) for what’s referred to as a LRP, or liquidity replenishment point. The goal of an LRP is to put a symbol into auction mode when a bunch of orders come in on one side, resulting in an imbalance. When those symbols were halted, orders for those symbols went to other exchanges. Where there was no liquidity and those orders ended up crossing at what are referred to as stub quotes. Stub quotes are required by some crossing engines to open the symbol for trading. No one is putting real bids in at a penny. When those orders crossed at a penny, the index calculations picked up the crosses and the ‘market’ plunged when the index re-priced. If those sell orders that went off at a penny had been limit orders, this probably wouldn’t have happened. This may be an example of SOR – Stupid Order Routing.

When NYSE re-opened those symbols, trading resumed and, magically, at the prices they were at before the LRP was activated. So that worked for NYSE. What didn’t work was other exchanges not honoring NYSE’s action. Knowing the rules is important; sending orders to off-primary exchanges can incur additional risk. We saw that during the flash crash.

So, this would have happened with or without HFT; with or without CEP; and with or without the vendors that sell all the hardware and software to make HFT possible. In fact, it could have happened with even 1 large sell order in an affected symbol. And HFT doesn’t usually involved large orders.

In fact, in this day of lessened or no return for those who use to act as intermediaries in the market (specialists or market makers), it could be argued that those engaging in HFT are actually helping the individual trader (or investor) by providing more liquidity.

It’s easy to blame HFT for the machinations we see in the market today – it’s harder to blame your neighbor for taking on more debt than they can afford to buy more and more goods than they can afford. And, as we saw during the flash crash, consumerism and greed is no longer an American only obsession.

Some day, you’ve got to pay.  And thanks for reading.