<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>cloudeventprocessing.com</title>
	<atom:link href="http://blog.cloudeventprocessing.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cloudeventprocessing.com</link>
	<description></description>
	<lastBuildDate>Sat, 24 Dec 2011 15:12:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Big Data Isn&#8217;t About the Data</title>
		<link>http://blog.cloudeventprocessing.com/2011/12/23/big-data-isnt-about-the-data/</link>
		<comments>http://blog.cloudeventprocessing.com/2011/12/23/big-data-isnt-about-the-data/#comments</comments>
		<pubDate>Sat, 24 Dec 2011 02:41:19 +0000</pubDate>
		<dc:creator></dc:creator>
				<category><![CDATA[Big Data]]></category>

		<guid isPermaLink="false">http://blog.cloudeventprocessing.com/?p=1443</guid>
		<description><![CDATA[I&#8217;LL HAVE THE DATA SOUP PLEASE We had data yesterday, we have data today, and we&#8217;ll have data tomorrow.  In fact, we&#8217;ll have a lot more data tomorrow.  We&#8217;re adding data to our lives at what seems to be an &#8230; <a href="http://blog.cloudeventprocessing.com/2011/12/23/big-data-isnt-about-the-data/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>I&#8217;LL HAVE THE DATA SOUP PLEASE</strong></p>
<p>We had data yesterday, we have data today, and we&#8217;ll have data tomorrow.  In fact, we&#8217;ll have a lot more data tomorrow.  We&#8217;re adding data to our lives at what seems to be an exponential rate.</p>
<p>I belong to the Low Latency, Big Data group on Linkedin, here&#8217;s the link &#8211; <a title="Get Your Group On!" href="http://www.linkedin.com/groups?home=&amp;gid=4208292&amp;trk=anet_ug_hm&amp;goback=%2Egmp_4208292">Low Latency, Big Data</a>.  There are some heavy hitters out there and we&#8217;ve been working together on a definition for big data.</p>
<p><strong>WHAT EXACTLY IS BIG DATA?</strong></p>
<p>There are a couple of working definitions in progress in the group, but the one I like that seems to be emerging really doesn&#8217;t have that much to do with data at all.  It&#8217;s about architecture.</p>
<p><strong>THE SCALES OF BIG DATA</strong></p>
<p>How about scaling to handle all this data.  I think one of the core tenants of big data is that it doesn&#8217;t fit on one machine.  Or maybe a better way to say that is that it won&#8217;t scale on one machine.  A seemingly natural progression to this logic is that if you want to play with big data, your architecture has to scale.</p>
<p><strong>GRAVITY IS HEAVY MAN</strong></p>
<p>Data has gravity.  Hadoop, Yahoo&#8217;s slow implementation of Google&#8217;s Map/Reduce typically runs on top of HDFS or Hadoop Distributed File System.  And when you run Hadoop Jobs, the code is shipped out to the nodes in a Hadoop cluster and run against the data.  That scales &#8211; it scales because if you want to store more data and run more processes against it, you just add nodes.  For the most part, this means that Hadoop is a shared-nothing architecture.  That&#8217;s important if you want to scale.  (go read about it elsewhere, then see the error I made above, then continue reading).  Running code on the node with data means you don&#8217;t have to ship the data around the network.  See the gravity?</p>
<p><strong>BIG DATA DOESN&#8217;T REWARD SHARING</strong></p>
<p>It would seem that by sharing nothing between nodes (things like state), we can run processes in parallel.  Running things in parallel against the same size data instead of running serially means we should get done sooner.  That&#8217;s fairly obvious, right?</p>
<p><strong>WHAT&#8217;S NOT OBVIOUS THEN?</strong></p>
<p>Big data has nothing to do with data.  Big data is the advent of grid and parallel processing.  Big data represents the democratization of tools and processing power (cloud) made available to anyone with a credit card.  Grid and parallel processing have been around for a while.  Elastic resource is relatively new.  Free software for VC&#8217;s to pimp is almost entirely new.  (we call that &#8216;open source&#8217;)</p>
<p><strong>VOLUME, VARIETY, VELOCITY</strong></p>
<p>Whatever.</p>
<p><strong>THE TAKE AWAY</strong></p>
<p>The take away here is that if you&#8217;re a vendor and your solution only runs on one machine, you don&#8217;t scale well enough to handle big data.  Since big data is really about scaling out.  If you haven&#8217;t solved this not-so-easy problem, then using the phrase big data in your marketing is nothing more than a big lie.</p>
<p><strong>AS ALWAYS</strong></p>
<p>Thanks for reading.  And Happy Holidays!</p>
<div id="tweetbutton1443" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F12%2F23%2Fbig-data-isnt-about-the-data%2F&amp;text=Big%20Data%20Isn%26%238217%3Bt%20About%20the%20Data&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F12%2F23%2Fbig-data-isnt-about-the-data%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2011/12/23/big-data-isnt-about-the-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What Exactly is Complex Event Processing Today?</title>
		<link>http://blog.cloudeventprocessing.com/2011/09/18/what-exactly-is-complex-event-processing-today/</link>
		<comments>http://blog.cloudeventprocessing.com/2011/09/18/what-exactly-is-complex-event-processing-today/#comments</comments>
		<pubDate>Sun, 18 Sep 2011 16:56:46 +0000</pubDate>
		<dc:creator></dc:creator>
				<category><![CDATA[CEP]]></category>

		<guid isPermaLink="false">http://blog.cloudeventprocessing.com/?p=1435</guid>
		<description><![CDATA[As much as I disagree with much of what Curt Monash writes, he did actually ask a good question recently in his post, &#8220;Renaming CEP&#8230; or not&#8221; Without getting into a rehash of the hash over there, let&#8217;s look at &#8230; <a href="http://blog.cloudeventprocessing.com/2011/09/18/what-exactly-is-complex-event-processing-today/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As much as I disagree with much of what Curt Monash writes, he did actually ask a good question recently in his post, &#8220;<a href="http://www.dbms2.com/2011/08/25/renaming-cep-or-not/">Renaming CEP&#8230; or not</a>&#8221;</p>
<p>Without getting into a rehash of the hash over there, let&#8217;s look at things a bit differently.  Let&#8217;s talk about what CEP is not.</p>
<p>I left trading to join a firm called NEON.  I was an early investor in this company, my mentor had started the firm, and to me it looked like the cat&#8217;s meow.  It was a great time, a lot of people made a ton of money, and I was introduced into the world of software via Enterprise Application Integration (EAI).  Using EAI, one could centralize the business logic associated with plugging different systems into each other, convert the format of one system into another, and pass messages around seamlessly between these applications.  There weren&#8217;t a lot of competitors at that time, we were arguable #1 in the space, and the technology became MQ Series Integrator (think Websphere, like I said, we made a lot of money.)</p>
<p>Well, CEP isn&#8217;t EAI because there&#8217;s no concept of format libraries &#8211; sure CEP engines use input/output adapters but sure does every program ever written (I&#8217;m waiting for the first salesperson to licence the keyboard/screen adapter set &#8211; available in different languages soon!).  We&#8217;re going to come back to EAI in a moment.</p>
<p>Throughout my career, the teams I&#8217;ve worked with have used a variety of 4th generation languages.  Stuff like Powerbuilder, SQL Windows, Paradox, etc.  Each of those environments had some common elements, screen designers, a domain specific language designed to make bizapp dev faster, and abstractions for common data stores.  Often times, our groups wrote servers that integrated with these front ends via RPC, a streaming connection, or databases.</p>
<p>CEP isn&#8217;t a 4th generation bizapp dev environment &#8211; there&#8217;s no facility for building gui&#8217;s.  Although some of the CEP platforms out there do have DSL&#8217;s, some also use SQL derivations.  I&#8217;ve used the SQL derivations (I&#8217;ve worked at two of those co&#8217;s) and guess what?  The people in those firms hated using the language themselves.  &#8221;Yes, you could do a covariance matrix with &lt;insert proprietary get-me-sued-for-naming-it-here&gt; but I could do it faster and easier in a different language.</p>
<p>I&#8217;ve also used many databases.  But you don&#8217;t use CEP to store data &#8211; you only process the data in flight.</p>
<p>So, CEP isn&#8217;t EAI, it&#8217;s not a database, and it&#8217;s not an application development environment.  Where, then, did CEP come from?  Let&#8217;s look at a couple.</p>
<p>The work out of Berkeley and the work out of Brown, Brandeis and MIT focused on event stream processing.  Here&#8217;s a blurb about Berkeley&#8217;s Telegraph:</p>
<p><strong>Telegraph is an adaptive data-flow system, which allows individuals and institutions to access, combine, analyze, and otherwise benefit from this data wherever it resides.  As a data-flow system, Telegraph can tap into pooled data stored on the network, and harness streams of live data coming out of networked sensors, software, and smart devices.  In order to operate robustly in this volatile, inter-networked world, Telegraph is adaptive &#8211; it uses new data-flow technologies to route unpredictable and bursty data-flows through computing resources on a network, resulting in manageable streams of useful information.</strong></p>
<p>And here&#8217;s one about Aurora (Brown, Brandeis, &amp; MIT):</p>
<p><strong>Aurora addresses three broad application types in a single, unique framework:</strong></p>
<ol>
<li><strong>Real-time monitoring applications continuously monitor the present state of the world and are, thus, interested in the most current data as it arrives from the environment. In these applications, there is little or no need (or time) to store such data.</strong></li>
<li><strong>Archival applications are typically interested in the past. They are primarily concerned with processing large amounts of finite data stored in a time-series repository.</strong></li>
<li><strong>Spanning applications involve both the present and past states of the world, requiring combining and comparing incoming live data and stored historical data. These applications are the most demanding as there is a need to balance real-time requirements with efficient processing of large amounts of disk-resident data.</strong></li>
</ol>
<p>Hmm.  I&#8217;ve worked with both of those packages &#8211; no mention of Complex Event Processing in there at all.  So where did that phrase even come from?  Well, that&#8217;s the title of David Luckham&#8217;s book, &#8220;The Power of the Event&#8221; in which the good professor describes not so much an implementation, but a set of processes designed to help us all run our businesses and missions more effectively.  In the book though, David references a language that deals with streaming data.  Oh oh&#8230;.</p>
<p>Around 2005-2006, a couple of firms were struggling trying to describe what Event Stream Processing was and why it was important and more importantly, why you should be spending money on it.  I was the CTO of one of those firms.  We competed mostly against Streambase at the time.  Somewhere during that time frame, the phrase Complex Event Processing was adapted in an effort to differentiate.  At that time, Aleri wasn&#8217;t CEP &#8211; they were OLAP.  Streambase, formerly Grassy Brook, probably choose that name in homage to Stream Processing.  Kaskad is Swedish for waterfall, or where a bunch of rivers and/or streams collide.  I don&#8217;t think Apama ever used the phrase ESP, they were focused on trading from the start.  Starting to get the picture?</p>
<p>So, if CEP isn&#8217;t EAI, and it&#8217;s not a 4th generation bizapp tool, what is it?  I&#8217;ve probably kicked this dead horse enough, but one more time and it&#8217;s not going to notice.  CEP needs 4 things to be called CEP (or ESP&#8230;):</p>
<ol>
<li>Domain Specific Language</li>
<li>Continuous Query</li>
<li>Time or Length Windows</li>
<li>Temporal Pattern Matching</li>
</ol>
<p>These 4 things, in my opinion, don&#8217;t make up a separate space, let alone a market.  What they describe is Event Stream Processing.  What they describe are features found in larger, more complete event processing environments from IBM, SAP, TIBCO, and Progress.  And TIBCO, for example, just added the missing features described above to their Business Events Platform, and had instant CEP (sarcasm mine).  Those offerings look a lot like the traditional EAI platforms &#8211; or where all of this began roughly 20 years ago.</p>
<p>So, I don&#8217;t think what a couple of vendors sell as Complex Event Processing is really CEP at all.  If you want an idea of what CEP is really all about, read David&#8217;s book to get started.  Then take a look at Tim Bass&#8217;s blog <a href="http://thecepblog.com/">thecepblog.com</a>.</p>
<p>For now, let&#8217;s just drop the phrase CEP (because it&#8217;s mostly just Stream Processing) because it means so little to so many and fails to impart any meaningful message to the people who actually write checks for this stuff.</p>
<p>Thanks for reading!</p>
<div id="tweetbutton1435" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F09%2F18%2Fwhat-exactly-is-complex-event-processing-today%2F&amp;text=What%20Exactly%20is%20Complex%20Event%20Processing%20Today%3F&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F09%2F18%2Fwhat-exactly-is-complex-event-processing-today%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2011/09/18/what-exactly-is-complex-event-processing-today/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>It&#8217;s Time to Kill the Elephant</title>
		<link>http://blog.cloudeventprocessing.com/2011/07/03/its-time-to-kill-the-elephant/</link>
		<comments>http://blog.cloudeventprocessing.com/2011/07/03/its-time-to-kill-the-elephant/#comments</comments>
		<pubDate>Mon, 04 Jul 2011 03:49:57 +0000</pubDate>
		<dc:creator>colin</dc:creator>
				<category><![CDATA[Opinion]]></category>

		<guid isPermaLink="false">http://cloudeventprocessing.wordpress.com/?p=1374</guid>
		<description><![CDATA[Google started using MapReduce about 10 years ago.  Somewhere between there and now, Doug Cutting decided that he could copy it while at Yahoo and Hadoop was born.  Doug now works at a company named Cloudera who bills themselves as &#8230; <a href="http://blog.cloudeventprocessing.com/2011/07/03/its-time-to-kill-the-elephant/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Google started using MapReduce about 10 years ago.  Somewhere between there and now, <a title="Doug's Famous Now" href="http://en.wikpedia.org/wiki/Doug_Cutting">Doug Cutting</a> decided that he could copy it while at Yahoo and <a title="Wikipedia has everything.  Yay!" href="http://en.wikipedia.org/wiki/Apache_Hadoop">Hadoop</a> was born.  Doug now works at a company named <a title="Leader in Batch" href="http://cloudera.com">Cloudera</a> who bills themselves as providing the &#8220;only solution that manages Apache Hadoop across the enterprise.&#8221;  Hadoop has been around for so long that even leading analyst firms are covering it, claiming that if your organization is an early adopter, you need to be looking at Hadoop.  Hear that Luddites?  Time to get moving.</p>
<div id="attachment_1375" class="wp-caption alignright" style="width: 310px"><a href="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/07/old-athlete.png"><img class="size-medium wp-image-1375" title="An Old Elephant" src="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/07/old-athlete.png?w=300" alt="" width="300" height="226" /></a><p class="wp-caption-text">Hadoop Is Picking Up Speed</p></div>
<p><strong>MAYBE THERE&#8217;S A REASON FOR THAT</strong></p>
<p>Recently, Google announced their move away from batch based MapReduce to something a little more real time.  Seams like it was taking days to update search results with something that you might be interested in.  Google never open sourced their implementation of MapReduce, which is said to be at least one or two orders of magnitude faster than Hadoop.  But still not fast enough.</p>
<p><strong>EVEN YAHOO IS GETTING INTO THE ACT</strong></p>
<p>Yahoo used to have a substantial relationship with Cloudera, at least according to Cloudera.  But now even Yahoo have started a company to distribute and support Hadoop.  Yahoo calls their company <a title="Elephants Are Large &amp; Slow" href="http://www.hortonworks.com">hortonworks</a>.</p>
<p><strong>WHAT THIS MEANS TO YOU</strong></p>
<p>Without getting into things like how much data and corresponding analysis you need to do before Hadoop makes any sense to use at all (most companies are not going to see any benefit at all), let&#8217;s recognize something.  All of these recent shifts from companies like Google, Yahoo, and others no longer see a competitive advantage in batch based MapReduce.  The future has arrived, let&#8217;s look at some evidence.</p>
<p><strong>REAL TIME HADOOP</strong></p>
<div id="attachment_1379" class="wp-caption alignright" style="width: 304px"><a href="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/07/map-reduce-eps.png"><img class="size-medium wp-image-1379" title="MapReduce" src="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/07/map-reduce-eps.png?w=294" alt="" width="294" height="300" /></a><p class="wp-caption-text">MapReduce</p></div>
<p>There have been more than a handful of releases in this space &#8211; like <a title="Partial Fault Tolerance..." href="http://s4.io">S4</a> from Yahoo, <a title="The Most Scalable Real-Time Data Processing Platform" href="http://www.hstreaming.com">HStreaming</a>, <a title="This isn't as new as they think..." href="http://tech.backtype.com/preview-of-storm-the-hadoop-of-realtime-proce">Storm</a>, and several NoSQL databases now supporting this, it means that for competitive advantage, you&#8217;d best be getting some real-time.  And getting it soon.</p>
<p><strong>WHAT IS REAL-TIME?</strong></p>
<p>Database vendors like <a title="Cassandra's Pimp" href="http://www.datastax.com">DataStax</a>, who support Cassandra, claim to be real-time.  They&#8217;re not.  They say that they&#8217;re real time because as soon as you commit data to the database, it&#8217;s available for query.  That&#8217;s supported by just about every database and hardly a new and exciting feature of NoSQL.  Even one of their big shots left to start a real time company named <a title="real time right now " href="http://www.platfora.com">Platfora</a>.</p>
<p><strong>CONTINUOUS QUERY OR EVENT-DRIVEN</strong></p>
<p>Rather than thinking about what real-time is or is not, let&#8217;s worry about event-driven.  Let&#8217;s use an example:</p>
<blockquote><p>I&#8217;m a manager, and I want to know when the average time on my website dips below 2 minutes.  Using the &#8216;my database is real time because the data I send to it can be queried after I write it&#8217; means that I would have to run this query repeatedly at regular intervals to catch this mounting exodus from my web properties.</p></blockquote>
<p><strong>THERE&#8217;S GOT TO BE A BETTER WAY</strong></p>
<p>And there is, it&#8217;s called <em>continuous query</em>.  I ask the same question as above, and there&#8217;s some process somewhere that&#8217;s sessionizing data from my web logs and injecting that into that server &#8211; the same server that I sent the query above to.  And when that process finds a web session that lasted less than 2 minutes, it sends another &#8216;row&#8217; to the program that submitted that query.</p>
<p><strong>ABRACADABRA</strong></p>
<div id="attachment_1377" class="wp-caption alignright" style="width: 310px"><a href="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/07/grow-old.jpg"><img class="size-medium wp-image-1377" title="Waiting for Hadoop Query" src="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/07/grow-old.jpg?w=300" alt="" width="300" height="168" /></a><p class="wp-caption-text">Waiting for Hadoop Query</p></div>
<p>And then I&#8217;ve got it on my dashboard, and can switch out the really badly designed page the marketing department A/B&#8217;d this morning.  That&#8217;s continuous query, or event-driven.  The term real-time didn&#8217;t even need to be mentioned.  If I was running batch based Hadoop, that notification could have taken hours, or days.  How much money would your company lose if that happened to you?</p>
<p><strong>BACK TO MAP/REDUCE</strong></p>
<div id="attachment_1378" class="wp-caption alignleft" style="width: 310px"><a href="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/07/350px-locutusofborg2367.jpg"><img class="size-medium wp-image-1378" title="The Collective is Faster" src="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/07/350px-locutusofborg2367.jpg?w=300" alt="" width="300" height="232" /></a><p class="wp-caption-text">I am Node of Cluster...</p></div>
<p>So if I can do the above, why do I need MapReduce?  MapReduce is an algorithm for splitting work up, distributing the work out to nodes where the data lives that needs to be analyzed, and then gathering the results.  If you&#8217;re problem is big enough, MapReduce might help you get it done faster than using just one machine.</p>
<p><strong>BUT EITHER WAY</strong></p>
<p>If you&#8217;re running batch processes, like some well known web properties are and think that Hadoop holds an answer to your ever dwindling ad revenue, you&#8217;re mistaken.  And if you&#8217;re that CIO, the other thing you need to be working on is most likely your resume.</p>
<p><strong>GET YOURSELF SOME CONTINUOUS QUERY, AND GET COMPETITIVE!</strong></p>
<p>and thanks for reading!</p>
<div id="tweetbutton1374" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F07%2F03%2Fits-time-to-kill-the-elephant%2F&amp;text=It%26%238217%3Bs%20Time%20to%20Kill%20the%20Elephant&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F07%2F03%2Fits-time-to-kill-the-elephant%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2011/07/03/its-time-to-kill-the-elephant/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Hope, FPGA&#8217;s, High Frequency Trading and the New Market Access Rules</title>
		<link>http://blog.cloudeventprocessing.com/2011/02/03/hope-fpgas-high-frequency-trading-market-access-rules/</link>
		<comments>http://blog.cloudeventprocessing.com/2011/02/03/hope-fpgas-high-frequency-trading-market-access-rules/#comments</comments>
		<pubDate>Thu, 03 Feb 2011 20:10:46 +0000</pubDate>
		<dc:creator>colin</dc:creator>
				<category><![CDATA[Capital Markets]]></category>
		<category><![CDATA[HFT]]></category>
		<category><![CDATA[Surveillance]]></category>

		<guid isPermaLink="false">http://cloudeventprocessing.com/?p=961</guid>
		<description><![CDATA[I recently became aware of an emerging practice most likely being implemented by clearing companies at the low end of the capitalization sprectrum offering a unique solution to the recent Market Access Rules. NO UN-FILTERED DIRECT ACCESS What the SEC &#8230; <a href="http://blog.cloudeventprocessing.com/2011/02/03/hope-fpgas-high-frequency-trading-market-access-rules/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I recently became aware of an emerging practice most likely being implemented by clearing companies at the low end of the capitalization sprectrum offering a unique solution to the recent <a title="Market Access Rules" href="http://www.suite101.com/content/sec-votes-to-stop-unfiltered-market-access-what-does-this-mean-a304266">Market Access Rules</a>.</p>
<p><strong>NO UN-FILTERED DIRECT ACCESS</strong></p>
<p>What the SEC is trying to do is remove, or reduce the opportunity for either crooks, idiots, or algo&#8217;s gone wild from doing bad things to the market. Under the new rules, order flow needs to be monitored. This is not something that the HFT crowd like to hear, because it slows them down. So a couple of innovative idiots got together and came up with the solution that I&#8217;m going to describe here.</p>
<p><strong>HOPE IS NOT A STRATEGY</strong></p>
<p>Let alone a comprehensive compliance or surveillance strategy. What the idiots are doing is putting a &#8216;black box&#8217; between the HFT firms FIX engines and the execution venues. The box, most likely powered by an FPGA device, scans the outbound order flow, and if it finds something it doesn&#8217;t like, it messes up the payload of the FIX order so that the execution venue (hopefully) rejects the message. Why is this done this way? Because the &#8216;black box&#8217; is both out of process &#8211; both the source of orders and resulting executions, etc. are behind FIX engines, and because the &#8216;black box&#8217; isn&#8217;t actually maintaining connections between the HFT firm&#8217;s order generators and execution venue.</p>
<p><strong>A PICTURE IS WORTH A MILLION REJECTS</strong></p>
<p>This is a little complicated, so let&#8217;s look at this picture:<br />
<a href="http://blog.cloudeventprocessing.com/?attachment_id=962" rel="attachment wp-att-962"><img class="aligncenter size-large wp-image-962" title="appliance" src="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/05/appliance1.png?w=1024" alt="" width="725" height="535" /></a></p>
<p>In the diagram above, the &#8216;black box&#8217; isn&#8217;t maintaining FIX connections to either the HFT&#8217;s order generators or the execution venue.  So, the &#8216;black box&#8217; can&#8217;t just reject the order if it&#8217;s out of bounds back to the order generator because then the FIX sequence #&#8217;s get all mixed up.  There&#8217;s a little more to this, but you get the general idea.</p>
<p><strong>YES, THIS IS REAL, AND I&#8217;M NOT KIDDING</strong></p>
<p>So, this whole thing is designed so that an examiner can come into the Olde Thyme Highe Frequency Trading Shoppey and be escorted into the back room and shown the shiny box.  Wow.  Are you serious?  &#8221;Look, we&#8217;re making sure that this firm isn&#8217;t doing anything wrong &#8211; we&#8217;re actively monitoring the flow and if they do something we don&#8217;t like, we shut them down.&#8221;  Right, they shut down the order flow attached to the box.  What about the order generators that the examiner doesn&#8217;t see? There&#8217;s a host of issues here, but we&#8217;re going to focus on one &#8211; and it&#8217;s a doozy.</p>
<p><strong>DENIAL OF SERVICE ATTACKS</strong></p>
<p>So, we&#8217;ve installed the OMICRON 5000 monitoring device and our HFT/algo team is ready to do business.  And everything is fine.  They&#8217;re trustworthy chaps and have no intention of gaming the system.  (cough cough).  But their first algo goes completely nuts.  And gets shut down by the clearing firm.  But it doesn&#8217;t really get shut down.  Instead, it&#8217;s sending 1000&#8242;s of malformed FIX messages to an execution venue per second.  Or maybe 10,000&#8242;s of malformed FIX messages to many execution venues.  Wow.  In the internet world, we call this a denial of service attack &#8211; flood a destination with more traffic that it can handle.  And while the execution venues can handle normal traffic, what about rejecting every message? Is every execution venue out there ready for this?  I don&#8217;t think so.  I&#8217;ve been involved with FIX longer than I&#8217;ll admit to in public, and I&#8217;ve seen a lot of testing  - &#8220;Yeah, reject worked.  It worked fine.  I mean, we never thought they&#8217;d be sending 1,000&#8242;s of orders a second that would all reject&#8230;&#8221;</p>
<p><strong>I DON&#8217;T KNOW</strong></p>
<p>What should be done about this.   I have lots of ideas about surveillance and how it should be done.  But I don&#8217;t have any thoughts about this.  Mostly because I never thought anyone would be so stupid as to ever actually deploy this type of &#8216;solution.&#8217;  Where&#8217;s the SEC when you need them?</p>
<p><strong>THANKS FOR READING</strong></p>
<div id="tweetbutton961" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F02%2F03%2Fhope-fpgas-high-frequency-trading-market-access-rules%2F&amp;text=Hope%2C%20FPGA%26%238217%3Bs%2C%20High%20Frequency%20Trading%20and%20the%20New%20Market%20Access%20Rules&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F02%2F03%2Fhope-fpgas-high-frequency-trading-market-access-rules%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2011/02/03/hope-fpgas-high-frequency-trading-market-access-rules/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Cassandra&#8217;s Data Model</title>
		<link>http://blog.cloudeventprocessing.com/2011/01/27/cassandras-data-model/</link>
		<comments>http://blog.cloudeventprocessing.com/2011/01/27/cassandras-data-model/#comments</comments>
		<pubDate>Fri, 28 Jan 2011 00:46:54 +0000</pubDate>
		<dc:creator>colin</dc:creator>
				<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://cloudeventprocessing.com/?p=922</guid>
		<description><![CDATA[As we prepare to implement our Market Data repository to facilitate algo development and back-testing, you should have downloaded Cassandra and installed it by now.  What, you haven&#8217;t?  Well, click here, get it done and then come back for some &#8230; <a href="http://blog.cloudeventprocessing.com/2011/01/27/cassandras-data-model/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As we prepare to implement our Market Data repository to facilitate algo development and back-testing, you should have downloaded Cassandra and installed it by now.  What, you haven&#8217;t?  Well, click <a title="Get Cassandra" href="http://cassandra.apache.org/download/" target="_blank">here</a>, get it done and then come back for some fun.  To get things up and running once you&#8217;ve downloaded Cassandra, click <a title="Configuration" href="http://www.datastax.com/docs/0.7/getting_started/index#installation" target="_blank">here</a> for some guidance (this assumes you&#8217;re running Linux but should point you in the right direction if you&#8217;re running Windoze).</p>
<p><strong>CONFUSION</strong></p>
<p>Most of the explanations I&#8217;ve read about Cassandra&#8217;s data model first extol the virtues of NoSQL and the evils of Relational Databases.  And so while getting the reader caught up in this mythic struggle that summons images from Tolkien’s middle earth, the point is lost.  And that point is?</p>
<p><strong>IT&#8217;S ALL ACTUALLY QUITE EASY</strong></p>
<p>Cassandra thinks about data the way we think about data.  Most of us think about data in rows and columns.  So does Cassandra.  But it also alleviates some extra stuff we don&#8217;t need while adding some stuff that we do need.  And that can be a little disconcerting initially.  To make things easier, let&#8217;s first describe a goal for our exercise.  We&#8217;d like to get a day&#8217;s worth of market data, by symbol, in ascending time order.  Also, we might like to get the data for a slice of time within that day.  Like, &#8220;give me all the BBO&#8217;s for American Airlines for May 20th, 2010,&#8221; or, &#8220;I&#8217;d like to see the BBO&#8217;s for American Airlines for May 20th, 2010 between 1 and 2pm.&#8221;  Let&#8217;s jump right in.</p>
<p><strong>LET&#8217;S GET OUR DATA</strong></p>
<p>As we subscribe to our favorite market data feed, we receive something like:</p>
<ul>
<li>Symbol,</li>
<li>Bid,</li>
<li>Offer,</li>
<li>Bid Size,</li>
<li>Offer Size.</li>
<li>Time Stamp, and</li>
<li>Seq # (most quote vendors provide a Sequence # because multiple quotes can occur for any given Time Stamp)</li>
</ul>
<p>We&#8217;re going to call this a column family.  Cassandra&#8217;s analog for a table is a Column Family.  You can see why this fits so well above &#8211; the columns that belong to the symbol AA comprise a family of related information.  I&#8217;d like to store this data by symbol, so later, I can retrieve it.  Using the Cassandra client (cassandra-cli &#8211; it&#8217;s in the bin directory where you installed Cassandra), let&#8217;s create the BBO Column Family.  It looks like this:</p>
<p><a href="http://blog.cloudeventprocessing.com/?attachment_id=954" rel="attachment wp-att-954"><img class="aligncenter size-full wp-image-954" title="cassandacf" src="http://cloudeventprocessing.files.wordpress.com/2011/01/cassandacf1.png" alt="" width="1019" height="322" /></a></p>
<p><code>create column family bbo with comparator = UTF8Type<br />
and column_metadata = [<br />
{column_name: symbol, validation_class:UTF8Type},<br />
{column_name: bb, validation_class: UTF8Type},<br />
{column_name: bo, validation_class: UTF8Type},<br />
{column_name: bbSize, validation_class: UTF8Type},<br />
{column_name: bSize, validation_class: UTF8Type},<br />
{column_name: timeStamp, validation_class: LongType},<br />
{column_name: seqNum, validation_class: LongType},<br />
];<br />
</code></p>
<p>And now that we’ve created the schema, let’s insert some quotes.</p>
<p><code>Set bbo[‘AA’][‘symbol’]=’AA’;<br />
Set bbo[‘AA’][‘bb’]=’123.34’;<br />
Set bbo[‘AA’][‘bo’]=’123.84’;<br />
Set bbo[‘AA’][‘bbSize’]=’100’;<br />
Set bbo[‘AA’][‘boSize’]=’200’;<br />
Set bbo[‘AA’][‘timeStamp’]=1234;<br />
</code></p>
<p>What happens when you execute a list bbo command now?  So, that&#8217;s easy enough.   So what happens as we get the next quote?  Well, we go to insert our data like this:</p>
<p><code>Set bbo[‘AA’][‘symbol’=’AA’;<br />
Set bbo[‘AA’][‘bb’]=’125.34’;<br />
Set bbo[‘AA’][‘bo’]=’125.84’;<br />
Set bbo[‘AA’][‘bbSize’]=’100’;<br />
Set bbo[‘AA’][‘boSize’]=’200’;<br />
Set bbo[‘AA’][‘timeStamp’]=1235;<br />
</code></p>
<p>And then to see our data, enter this command (again):</p>
<p><code>List bbo;</code></p>
<p>When we use the ‘list bbo’ command, we’re only go see that data last inserted for that row key.  What happened to the previous data?  It was over-written with the new data.  So if we wanted to save each quote, we could combine the timestamp with the column name and then we’d be inserting unique columns each time and we’d be fine.  But there’s a different way to do this.</p>
<p><strong>BIG DEAL, I DON’T SEE ANYTHING DIFFERENT HERE</strong></p>
<p>And you don’t, because we haven’t started introducing the special sauce yet.  Well, we kind of did.  In the schema definitions above, you’ll notice we didn’t say that much about what we could or couldn’t insert into a row.  We just started adding columns dynamically.  So, each row, which is identified by a key, can have different columns in it and even a different number of columns.</p>
<p><strong>WELL, THAT&#8217;S NOT GOING TO WORK</strong></p>
<p>So, how do we keep track of all the quotes for our symbol?  First a little clarification, the Column Family above is really BBO, and we&#8217;ve inserted a row identified by the key, &#8216;AA&#8221; and some associated tag/data value pairs.  Think of this as a map of maps.  So now, we need to insert the bits that change for a given symbol over time.  How could we do that?  We create a Super Column Family of course.  A Super Column Family contains Super Columns.  A Super Column is kind of like another row of data – so using our example above, the Super Column we’ll be inserting consists of the BB, BO, BB Size, etc. The data above gets inserted using [AA] as our row key, and we need to pick a key for the Super Column that contains the quote data.    Let&#8217;s pick Seq# as our Super Column key.  Our row key is still Symbol, and I’ve prepended the date to it.  This way, all the data for a day’s worth of AA will be in the same row.  This is called a compound, or aggregate, key.  It looks like this:</p>
<p><a href="http://blog.cloudeventprocessing.com/?attachment_id=951" rel="attachment wp-att-951"><img class="aligncenter size-large wp-image-951" title="cassandadm" src="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/05/cassandadm1.png?w=1024" alt="" width="725" height="272" /></a></p>
<p><code>create column family sbbo with column_type = 'Super' and comparator = ‘BytesType’<br />
and column_metadata = [<br />
{column_name: bb, validation_class: UTF8Type},<br />
{column_name: bo, validation_class: UTF8Type},<br />
{column_name: bbSize, validation_class: UTF8Type},<br />
{column_name: bSize, validation_class: UTF8Type},<br />
{column_name: timeStamp, validation_class: LongType},<br />
];<br />
</code></p>
<p>And the insert statements look like this (we&#8217;re using the Seq# as the key &#8211; that&#8217;s the Super Column key right after the row key or, &#8217;20100124:AA&#8217; below):</p>
<p><code>Set sbbo[‘20100124:AA’][1234][‘bb’]=’100.00’;<br />
Set sbbo[‘20100124:AA’][1234][‘bo’]=’101.00’;<br />
Set sbbo[‘20100124:AA’][1235][‘bb’]=’101.00’;<br />
Set sbbo[‘20100124:AA’][1235][‘bo’]=’102.00’;<br />
Set sbbo[‘20100125:AA’][1234][‘bb’]=’100.00’;<br />
Set sbbo[‘20100125:AA’][1234][‘bo’]=’101.00’;<br />
Set sbbo[‘20100125:AA’][1235][‘bb’]=’101.00’;<br />
Set sbbo[‘20100125:AA’][1235][‘bo’]=’102.00’;<br />
</code><br />
Now let’s see what’s in the column family:</p>
<p><code>List sbbo;</code></p>
<p>So now it looks like we’re able to store a set of quotes for a symbol for any given day.  Bingo.</p>
<p>All we’ve really done here is add another map – so we now have a map (Date, Symbol) that contains a map (Symbol, Quote) that contains another map (Quote, QuoteField).  Or, what we’ve done is figured out a way to represent the potentially sparse fact tables resulting from large data analysis (OLAP) projects in a concise and easily addressable fashion.  Told you it wasn’t that hard.</p>
<p><strong>GIVE ME MY DATA</strong></p>
<p>So, now that we’ve inserted a couple of rows of data, let’s see how to get our data.  From above, we want to:</p>
<ol>
<li>Get all the data for a day’s worth of a symbol, and</li>
<li>Get all the data for a slice of time during a day for a symbol</li>
</ol>
<p>Assuming you’ve entered the statements above to insert the data, we can retrieve an entire day’s worth of AA with this simple statement:</p>
<p><code>get sbbo[‘20100124:AA’];</code></p>
<p>Now that we’ve gone over some of Cassandra’s basics, we’ll get a little more into it in upcoming posts.  That&#8217;s where we&#8217;ll cover the goal in #2.</p>
<p><strong>THANKS FOR READING</strong></p>
<div id="tweetbutton922" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F01%2F27%2Fcassandras-data-model%2F&amp;text=Cassandra%26%238217%3Bs%20Data%20Model&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F01%2F27%2Fcassandras-data-model%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2011/01/27/cassandras-data-model/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Building a Back Testing Platform for Algorithmic Trading</title>
		<link>http://blog.cloudeventprocessing.com/2011/01/16/building-testing-platform-algorithmic-tading/</link>
		<comments>http://blog.cloudeventprocessing.com/2011/01/16/building-testing-platform-algorithmic-tading/#comments</comments>
		<pubDate>Mon, 17 Jan 2011 03:05:44 +0000</pubDate>
		<dc:creator>colin</dc:creator>
				<category><![CDATA[Algo Trading]]></category>
		<category><![CDATA[Capital Markets]]></category>
		<category><![CDATA[CEP]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Cassandra]]></category>

		<guid isPermaLink="false">http://cloudeventprocessing.com/?p=895</guid>
		<description><![CDATA[On this continuing series, I am examining thoughts and specific implementation details around building a back-testing platform for algo trading.  Eventually, we&#8217;ll see where complex event processing plays and how to implement it. Appendix to Part One &#8211; The Data &#8230; <a href="http://blog.cloudeventprocessing.com/2011/01/16/building-testing-platform-algorithmic-tading/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>On this continuing series, I am examining thoughts and specific implementation details around building a back-testing platform for algo trading.  Eventually, we&#8217;ll see where complex event processing plays and how to implement it.</p>
<p><strong>Appendix to Part One &#8211; The Data Format</strong></p>
<p>Rather than looking at various database solutions first and then trying to define the problem in terms of those solutions, let&#8217;s first examine what market data looks like.  In its most simple form, market data looks like this (there&#8217;s usually a little more, but this is fine for our purposes):</p>
<ul>
<li>Date: The date of the market data,</li>
<li>Time: When did the quote occur during the date,</li>
<li>Sequence #: Most quote or trade streams include a sequence #,</li>
<li>Symbol: What security is this data for?</li>
<li>Best Bid: The best bid (we&#8217;re going to concern ourselves with BBO data for this series, it&#8217;s easier),</li>
<li>Best Bid Size: How much does someone want to buy at the Best Bid,</li>
<li>Best Offer: The best offer,</li>
<li>Best Offer Size: How much does someone want to sell at the Best Offer.</li>
</ul>
<p>Consider this chart:</p>
<p style="text-align: center;"><strong>Market Data</strong></p>
<p style="text-align: center;"><strong><a href="http://blog.cloudeventprocessing.com/?attachment_id=897" rel="attachment wp-att-897"><img class="size-large wp-image-897 aligncenter" title="Capture" src="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/05/capture11.png?w=1024" alt="" width="725" height="460" /></a><br />
</strong></p>
<p>If we break down data, we can successively see how data might be arranged on disk for subsequent reading.  We want to read the data very quickly.  If we were using a standard relational database, it&#8217;s easy to see that we might be replicating some unnecessary data during the reads.  And we if use a typical columnar database, we can see that there are chunks that could be read together increasing throughput.</p>
<p>For example, for any given millisecond (Time) in a quote feed, there may be more than one symbol with a quote.  In fact, that&#8217;s quite common.  So replicating the time stamp is superfluous.  So if we had a table for a date&#8217;s worth of data, then we&#8217;d have a Time column that was replicated throughout the table.  No reason to do that.</p>
<p>Looking again at the data, we can see that, for a given time, there might be multiple quotes available for multiple symbols.  We&#8217;d like to read those in order as a little group.  By organizing the data on disk as a flattened multi-dimensional map of maps, we would:</p>
<ol>
<li>Start with a given day (our table),</li>
<li>Start with a time (our row),</li>
<li>Read each quote in sequence # order (our column)</li>
<li>Process (do something)</li>
<li>Increment the time, and go to #2 above until we run out of data (lather, rise, repeat)</li>
<li>Put the $ in the bank</li>
</ol>
<p>If we could write this data structure to disk as we get it from the quote feed, and had fast enough disk, we could keep up with the feed.  If we needed to create some indexes on the data, we could easily do that as well.  We&#8217;d simply create another table that would hold an inverted list of time and sequence #&#8217;s by symbol.  If we want to process a day&#8217;s worth of data, we&#8217;re all set.  If we want to process a symbol, or group of symbols, we&#8217;re all set.</p>
<p>So, to summarize, we need a hybrid approach.  In some places, we want rows of data &#8211; storing columns of data via a unique key.  In our case, that&#8217;s the Time column above for a given day.  The row above is Time, the column (or Super Column) is the Quote for a Symbol.  The Super Column&#8217;s key is the Sequence #.  Can anyone guess which database might fit nicely for this use case?</p>
<p>In my next post, I&#8217;ll describe a formalized data structure and it&#8217;s implementation.  I might even include a little code for all you #NoSQL guys and gals out there.</p>
<p>Thanks for reading!</p>
<div id="tweetbutton895" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F01%2F16%2Fbuilding-testing-platform-algorithmic-tading%2F&amp;text=Building%20a%20Back%20Testing%20Platform%20for%20Algorithmic%20Trading&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F01%2F16%2Fbuilding-testing-platform-algorithmic-tading%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2011/01/16/building-testing-platform-algorithmic-tading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a Back Testing Platform for Algorithmic Trading</title>
		<link>http://blog.cloudeventprocessing.com/2011/01/14/building-testing-platform-algorithmic-trading/</link>
		<comments>http://blog.cloudeventprocessing.com/2011/01/14/building-testing-platform-algorithmic-trading/#comments</comments>
		<pubDate>Fri, 14 Jan 2011 14:59:07 +0000</pubDate>
		<dc:creator>colin</dc:creator>
				<category><![CDATA[Algo Trading]]></category>
		<category><![CDATA[CEP]]></category>

		<guid isPermaLink="false">http://cloudeventprocessing.com/?p=876</guid>
		<description><![CDATA[In this series, I&#8217;m going to outline in general, how to build a back-testing platform for the creation, tweaking, and subsequent execution of algorithms used in electronic trading. Part One – The Data I recently made some comments on Vertica&#8217;s &#8230; <a href="http://blog.cloudeventprocessing.com/2011/01/14/building-testing-platform-algorithmic-trading/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In this series, I&#8217;m going to outline in general, how to build a back-testing platform for the creation, tweaking, and subsequent execution of algorithms used in electronic trading.</p>
<p><strong>Part One – The Data</strong></p>
<p>I recently made some comments on Vertica&#8217;s blog in regards to what I considered to be a fairly bold claim.  They said that Vertica was the only real column store.  But even if they are, so what? In my comments, I alluded to my belief that we optimize problems to solutions – we try to fix stuff using what we’ve got in our toolbox without having to run to Home Depot.</p>
<p>The real test is when the rubber hits the road &#8211; how do you actually solve a problem in a new way that&#8217;s motivating.  And by motivating I mean the solution addresses the issues, enables new capabilities, and is economically attractive.</p>
<p>So rather than tell you that DarkStar and our approach to processing both real-time and historical data (there&#8217;s a difference?) is the Real Enchilada, I thought I would illustrate a real world use case.</p>
<p>Let&#8217;s say you want to store a bunch of market data.  And I mean a bunch.  You want to store every piece of market data for the whole US Equities market.</p>
<p>And you&#8217;d like to have this data so that you can run analytics on it.  Or maybe even back-test strategies for buying and selling stocks.  So let&#8217;s assume that you&#8217;ve got some java code lying around to do that.</p>
<p>For our example, we are interested in seeing whether or not using volume average weighted price strategies actually work.   In our example, we will pretend that we are buying a lot of stock, and the theory we want test is whether or not buying that stock during the day when it&#8217;s lower than it&#8217;s weighted average price will give us a better average price during the day than just going with the flow (often referred to as volume participation).</p>
<p>We are all familiar with how relational databases work, and anyone who&#8217;s been in capital markets for a while knows how futile it would be to use something like Oracle for this due to cost, hardware and just how difficult it is to get the data into the database in the first place,</p>
<p>Oh that&#8217;s right, I forgot to tell you, we are going to have to load this data first.</p>
<p>I am not going to go into the relevant benefits of a column store here either, you can check out many other websites for that.</p>
<p>Instead, let&#8217;s look at some issues.  First, I would rather load the data directly into the database as it happens.  Staging the data separately is costly and error prone. In addition, what happens when you decide to load that data and encounter a problem that can&#8217;t be fixed in time for the next market day?  What if you actually run out of space or compute to get caught up?  Well then you can&#8217;t back test the next day to further refine your algorithms.  Algo’s should be tweaked every day.  New algo’s need to be developed to remain competitive.  So here, a database error costs real money.</p>
<p>So I need a fast data store.</p>
<p>As I am loading the data, what happens if one of my disks goes boom? Or one of my machines go boom? Well, now I have a problem.  If I fail over to another datacenter, how do i reconcile? What a nightmare!</p>
<p>So I need a data store that we can take a sledgehammer to and it will keep running.</p>
<p>Hey, if I have this big historical data store, I still need to query it while it is being updated.  Ideally, I would like to also be running analysis and back testing during the day.  Scheduling jobs to run at night is so very &#8217;90&#8242;s.</p>
<p>So my data store has to facilitate both interactive query and batch analysis.</p>
<p>But wait, doing all of this means that I am going to have to figure out how to use the same code for back-testing that i use to generate orders during market hours.  It&#8217;s either that or use some visual, script based or different harnesses for my java or C++ code.  Yet another nightmare.</p>
<p>So, I would Iike to run the same code against my historical data store that I also use to generate orders during the day.</p>
<p>There&#8217;s a bunch of other stuff too, management, instrumentation, removing old data that I don&#8217;t need for back-testing, all the stuff we associate with normal day to day big data operations. We need to know what&#8217;s going on during the day so that we can be proactive.  There&#8217;s gold in that data!</p>
<p>And one last thing, it would be really cool if most of this technology wasn&#8217;t proprietary.  I mean let&#8217;s face it, firms that talk more about their investors on their websites than their clients can&#8217;t possibly have my best interests at heart.</p>
<p>This is a tall list.  Let’s knock it down, one by one.</p>
<p>Here is a diagram for your consideration.</p>
<p><a href="http://blog.cloudeventprocessing.com/?attachment_id=877" rel="attachment wp-att-877"><img class="alignleft size-medium wp-image-877" title="Capture" src="http://blog.cloudeventprocessing.com/wp-content/uploads/2011/05/capture.png?w=300" alt="" width="300" height="173" /></a></p>
<p>The diagram isn’t very technical, and that’s on purpose – I’m outlining an algorithm, or methodology that may or may not solve our problem.</p>
<p>In the diagram, I’ve depicted the database as a cluster of machines.  Instead of using one big machine backed by a SAN, I’m going to use a number of machines.  Each of those machines is going to connect to the Market Data source and get data.</p>
<p>As we receive the data, we’re going to take a peek at it, and determine where in the cluster that data needs to live and while we’re doing that, we’re going to right it to disk.  A background process will make sure that the data ends up on the node we want it on.  More of why that’s so incredibly important in Part Two – Analyzing the Data in this series.</p>
<p>Also, I’m going to ask the cluster to replicate everything we’re writing to it – we’re going to end up writing the data a total of 2 times in this example.  I might usually suggest 3, but we’ve got two data centers running the same solution, so I’ll actually have 4 copies of the data.</p>
<p>Why write the data to three nodes in the cluster?  First, if a node goes down, I still want to be able to write data.  If the node that goes down is the primary node, I’m going to remember that and when that node comes back up, I’m going to write all the data to it as part of its “Welcome Back to the Cluster Party!”  And second, if I’m reading data from the cluster (remember, we’ve got algo’s running and users querying this data), I want my data.  If a node goes down, your users don’t really care – they just want their data.  By replicating the data across multiple nodes, I achieve high availability without having to fail-over to another instance or data center.</p>
<p>Ok, we’ve got the Sledge Hammer test handled, which is cool, but everything I’ve described above sounds like it’s going to take a lot of time and that the system is going to very slow.</p>
<p>Not true.  Each node in the cluster above is subscribing to market data.  So if one machine can ingest X messages per second, then a cluster of 10 machines should be able to ingest 10 * X messages per second, right?  Let see what that means in a real world example:</p>
<p>On May 20, 2010, there were about 1.1 billion BBO messages as published via SuperFeed (NYSE’s market data platform), those quotes represent the best bid, bid size, offer, and offer size for each stock at any given time.  In terms of messages per second, that’s about 50,000.  In terms of size per second, that’s about 4,500k per second.  Hmmm, chunky!</p>
<p>These are intimidating numbers.  But if we divide the problem up a bit, and use 10 nodes in a cluster, each node only needs to ingest about 450k per second across 5,000 messages.  All of a sudden, we’re dealing with something quite reasonable.</p>
<p>So, now we’ve got a cluster that can load the entire market real time and it’s redundant.  What about analyzing the data?  That’s in Part Two – Analyzing the Data which I’ll post next week.</p>
<div id="tweetbutton876" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F01%2F14%2Fbuilding-testing-platform-algorithmic-trading%2F&amp;text=Building%20a%20Back%20Testing%20Platform%20for%20Algorithmic%20Trading&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2011%2F01%2F14%2Fbuilding-testing-platform-algorithmic-trading%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2011/01/14/building-testing-platform-algorithmic-trading/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Predictions for 2011</title>
		<link>http://blog.cloudeventprocessing.com/2010/12/19/predictions-hell/</link>
		<comments>http://blog.cloudeventprocessing.com/2010/12/19/predictions-hell/#comments</comments>
		<pubDate>Sun, 19 Dec 2010 21:44:49 +0000</pubDate>
		<dc:creator>colin</dc:creator>
				<category><![CDATA[Opinion]]></category>
		<category><![CDATA[Predictions]]></category>

		<guid isPermaLink="false">http://cloudeventprocessing.com/?p=850</guid>
		<description><![CDATA[Some predictions for 2011.  In no particular order or importance. 1. CEP &#8211; The Feature There&#8217;s a couple of things going on here.  The most important being that Mark Palmer is writing blog posts about Richard Tibbetts writing blog posts &#8230; <a href="http://blog.cloudeventprocessing.com/2010/12/19/predictions-hell/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Some predictions for 2011.  In no particular order or importance.</p>
<p><strong>1. CEP &#8211; The Feature</strong></p>
<p>There&#8217;s a couple of things going on here.  The most important being that Mark Palmer is writing blog <a title="Mark Palmer" href="http://streambase.typepad.com/streambase_stream_process/2010/12/financial-software-engineering-the-flash-crash.html" target="_blank">posts</a> about Richard Tibbetts writing blog <a title="Richard Tibbetts" href="http://www.tabbforum.com/opinions/what-does-the-flash-crash-mean-for-financial-software-engineering" target="_blank">posts</a> on the Tabb Group&#8217;s site about writing better software on Wall Street (because software startups write better code and deal with bigger problems than firms on Wall Street, especially exchanges, do, right?&#8230;).  No customer win.  No &#8216;yet another use for CEP.&#8217;  Just good old fashion buzz-word copy designed to remind people that there&#8217;s still one stand-alone CEP vendor. (click click click, is this link working?)</p>
<p>CEP will become a feature of larger, more established, horizontal offerings.  Because once the opportunities in Canada and South America dry up (you know those hotbeds for financial engineering, right? Canada and <a title="South America Has Money!" href="http://en.wikipedia.org/wiki/Economy_of_South_America" target="_blank">South America</a>?  You don&#8217;t?) the reality will sink in.  What&#8217;s that reality?  That no one in NYC is buying CEP engines for HFT trading anymore.  Why?  The CEP vendors don&#8217;t know why.  Even Apama has seen the light.  You can now process events one at a time, or within the context of their CEP engine.  Stunning.  They&#8217;re pushing the &#8216;platform.&#8217;  Good.  It&#8217;s about time <a title="Tibco" href="http://www.tibco.com/" target="_blank">Tibco</a> got some competition.</p>
<p>So this coming year won&#8217;t be the year of CEP, it will be the yesteryear of CEP, like, &#8220;Remember <a title="Nostalgia" href="http://en.wikipedia.org/wiki/Nostalgia" target="_blank">yesteryear</a>, when we all thought CEP was going to be really hot?&#8221;  CEP will become a feature found in Event Processing Platforms.  And we&#8217;ll finally start to see adoption of those platforms in large, house hold names.</p>
<p><strong>2. Hadoop &amp; Analytical Databases</strong></p>
<p>People are going to begin realizing that databases that incorporate map/reduce into their architecture will be *no faster* than Hadoop.  Why?  Go buy a book about <a title="Hadoop - The Definitive Guide" href="http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449389732/ref=sr_1_1?ie=UTF8&amp;qid=1292793842&amp;sr=8-1" target="_blank">Hadoop</a> and then sit down with a piece of paper and pencil.  Databases are designed to support multitudes of users all asking different questions.  And vendors would like to have us believe that they&#8217;re also all running long lasting jobs taking advantage of their shiny map/reduce implementations.  Long lasting.  As in, not interactive.  As in, batch jobs running on a map/reduce framework.  The bigger the job, the less increase in throughput the analytical database will be able to offer.  It&#8217;s physics.  Save your money.  So this coming year, we&#8217;ll probably see further consolidation and some unexpected exits in this area.</p>
<p><strong>3.  And Speaking About Hadoop</strong></p>
<p>Batch is dead as competitive advantage.  As <a title="Jeff Jonas" href="http://jeffjonas.typepad.com/" target="_blank">Jeff Jonas</a> loves to point out, and points out really well, data velocity is growing.  And the rate at which data velocity is growing is increasing.  And companies can&#8217;t process the data they have <a title="More Big Data Please" href="http://www.readwriteweb.com/enterprise/2010/11/executives-are-addicted-to-big.php" target="_blank">today</a>.  And many companies are actually making bad decisions with more data, not better decisions, why?  Because the data has lost most of its value once it&#8217;s been crunched.  You can only take a batch system like Hadoop so far.  But right now (at least for the near future) you actually still need to incorporate some ideas from the batch world for everything to come together.  So this coming year, we&#8217;ll see more people start treating Hadoop as either a must have to compete (minimal cost of entry) or &#8220;How f(*)(* much does that cluster cost to run?  There&#8217;s got to be a better way!&#8221;  It&#8217;s time to outsource your Hadoop cluster.</p>
<p><strong>4. Real Time &amp; Batch</strong></p>
<p>I&#8217;m not saying that Hadoop or Map/Reduce is a waste of time and money like some vendors who make outrageous claims like, &#8220;Google has stopped using map/reduce.&#8221;  That&#8217;s idiotic.  Please put the kool-aid down, you&#8217;ve had enough to drink.  What&#8217;s important is the ability to analyze data in flight, to make decisions while there&#8217;s still the opportunity to have an impact.  How does one accomplish this?  By having a context in which events are analyzed.  How is context built?  Via the constant processing of events in flight, constructing and augmenting context, and supplementing that context with the result of monster jobs run on gazilla-bytes of data (like that? gazilla- it&#8217;s mine).  So this coming year, we&#8217;ll see a focus on moving analytics to real time.  (deeper analytics than VWAP-please!)</p>
<p><strong>5. The Big Picture</strong></p>
<p>There have been some really neato-keeno entries in the visualization space.  Things like Tableau, Spotfire, etc.  But they&#8217;re great for analysis of relatively static data sources.  They&#8217;re not for real time stuff.  Even offerings from vendors like Panopticon, which can provide some insight into multidimensional data sources updating in real time, really offer quite limited analysis tools for big data.  So in the coming year, we should see more focus on real time data mining.</p>
<p><strong>6.  Did He Say, &#8220;Big Data?&#8221;</strong></p>
<p>Yup.  I said it.  I&#8217;ve heard people define big data as more data than will fit on one machine.  Those people haven&#8217;t worked on the machines I&#8217;ve worked on.  I define big data as &#8220;when you can&#8217;t turn your data into actionable intelligence fast enough to have an impact during the window of opportunity.&#8221;  Or something like that.  I&#8217;m not in marketing.  Common patterns for analyzing big data are emerging that tie as-it-happens analysis with context and historical data.  The lines are blurring.  It&#8217;s all just becoming data.  And business wants it all.  Even my father-in-law, in Germany, asked, &#8220;Ja, SAP wants me to store my data in ze cloud.&#8221;  Everyone knows about big data and 2011 is going to be all about it.  Getting it.  Storing it.  Analyzing it.  Visualizing it.  And then we&#8217;ll see the real emergence of privacy issues, like what happens when our respective governments start using simple tools like the &#8216;People You May Know&#8217; from Linkedin during child pornography investigations?  It&#8217;s going to happen.  Our government isn&#8217;t ready.  The legal infrstructure isn&#8217;t there.  There will be the formation of chaos in this regard in 2011.</p>
<p><strong>7.  Computer Network Attack Platforms (CNA)</strong></p>
<p>In 2011, something important is going to happen.  The general populace will be made aware that, in addition to all the traditional, ground-based, we can&#8217;t win engagements in places like Afghanistan and Iraq, that we&#8217;re also involved in a different kind of war.  One that rages on every day, one that runs 24&#215;7 and involves facets of technology present both on earth and in space and in the Internet.  And that&#8217;s cyber warfare.  World War III is here already &#8211; China has already &#8216;stolen&#8217; the Internet from the United States for about 15 minutes, diverting the majority of our most important Internet based traffic through their country for storage and analysis.  Doesn&#8217;t that raise an eyebrow?  It should.  One of the biggest things that happened in 2010 went off without a hitch and without a great deal of coverage.  Iran&#8217;s nuclear power plant, the one everyone was afraid of, was rendered inoperable by a virus.  Because that plant is inoperable, Russia is going to continue to make big money.  And Israel is going to sleep easier.  Odd bedfellows, wouldn&#8217;t you say?  We&#8217;re going to learn more about CNA&#8217;s in 2011.</p>
<p><strong>7.  What about NoSQL?</strong></p>
<p>Yawn.  It&#8217;s going to be a TWO BILLION DOLLAR market.  Just like CEP was.  Really.</p>
<p><strong>Happy New Year!</strong></p>
<div id="tweetbutton850" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2010%2F12%2F19%2Fpredictions-hell%2F&amp;text=Predictions%20for%202011&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2010%2F12%2F19%2Fpredictions-hell%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2010/12/19/predictions-hell/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>What&#039;s Wrong With Complex Event Processing?</title>
		<link>http://blog.cloudeventprocessing.com/2010/11/20/wrong-complex-event-processing/</link>
		<comments>http://blog.cloudeventprocessing.com/2010/11/20/wrong-complex-event-processing/#comments</comments>
		<pubDate>Sat, 20 Nov 2010 12:53:17 +0000</pubDate>
		<dc:creator>colin</dc:creator>
				<category><![CDATA[CEP]]></category>
		<category><![CDATA[Opinion]]></category>

		<guid isPermaLink="false">http://cloudeventprocessing.com/?p=843</guid>
		<description><![CDATA[I spend a significant amount of my time keeping up with advances in processing high velocity big data.  Over the last year, I&#8217;ve watched the NoSQL camp grow a lot.  And now, some folks are even forecasting a market approaching &#8230; <a href="http://blog.cloudeventprocessing.com/2010/11/20/wrong-complex-event-processing/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I spend a significant amount of my time keeping up with advances in processing high velocity big data.  Over the last year, I&#8217;ve watched the NoSQL camp grow a lot.  And now, some folks are even forecasting a market approaching <a title="Wanna by a bridge?" href="http://www.marketresearchmedia.com/2010/11/11/nosql-market/" target="_blank">$2 Billion USD by 2015</a>. The last time I saw that kind of trajectory for a new software category was for Complex Event Processing.  So without casting any undue aspersion on the NoSQL camp, let&#8217;s talk about why CEP has so dramatically failed to generate the returns venture capital firms were so sure they were going to achieve.</p>
<p><strong>WHAT IS EVENT PROCESSING?</strong></p>
<p>Event Processing, or Event Driven Architectures, means nothing more than processing an event one event at a time; preferably sometime shortly after they occur.  The opposite of this is <a title="Not A Very Good Definition" href="http://en.wikipedia.org/wiki/Batch_processing" target="_blank">Batch Processing</a>, which means batching events, or messages, or what most of the world would call a row, of data and processing them together.  In batches.  Sounds simple enough, right?  All of you reading this blog post have used an Event Driven Architecture.  In fact, you&#8217;re using one now &#8211; it&#8217;s in your browser.  Can you imagine what the user experience would be if your browser &#8216;batched&#8217; up all of your mouse clicks and submitted them every 30 seconds?  Event Driven Architectures promise the same type of agility and increased user experience for line of business and consumer applications that you&#8217;re experiecing right now.  In fact, it&#8217;s probably hard to think about using the web in batch mode &#8211; it just doesn&#8217;t make sense.</p>
<p><strong>WHAT IS COMPLEX EVENT PROCESSING?</strong></p>
<p>For the most part, a marketing phrase.  That&#8217;s right &#8211; and again, for the most part, it&#8217;s completely meaningless.  As an early and continuing contributor to this particular area of technology, I remember when StreamBase, Apama, myself, and others called this field Event Stream Processing.  Then one of those firms marketing departments decided to differentiate.  I&#8217;ll leave the specific firm to your intuition.   So, what is Event Stream Processing?  That&#8217;s much easier to answer.  Event Stream Processing is Event Processing with four additional key components:</p>
<p><strong>1. Continuous Query</strong></p>
<p>Rather than having to poll a server for an event, using ESP , the user of the system issues a query and is subsequently informed with events, aggregations, or patterns that satisfy the specifics of the query.  This happens continuously, until you stop the query.</p>
<p><strong>2. Windows (Time and/or Length)</strong></p>
<p>Using ESP, the user can ask, as an example, for an average value of some key over either a time or length window.  Something like, &#8216;Give me the average amount of time people have spent on the homepage in the last 10 minutes.&#8217;  This query would provide an updated average either continuously, or perhaps at regular intervals.</p>
<p><strong>3. Pattern Matching</strong></p>
<p>With Pattern Matching, I&#8217;m able to define a series of events that fit a pattern, and then be notifified when that pattern is observed.  Usually within some Time or Length Window.  So, I might ask, &#8220;How many users are going from the &#8220;Home Page&#8221; to the &#8220;About Company Page&#8221; and then clicking on &#8220;My Profile&#8221; during a rolling 10 minute window&#8221;.&#8221;</p>
<p><strong>4. A Language</strong></p>
<p>Tying all of the above together in a neat little language is a cool idea &#8211; it makes using these features easier. At least, that&#8217;s the theory.  And this is one place where CEP has gone wrong and is not the general computing revolution that myself and others have hoped for.  I&#8217;ll expound upon this after a brief distraction in the next 2 paragraphs.  Please bear with me.</p>
<p><strong>WHERE IS COMPLEX EVENT PROCESSING USED?</strong></p>
<p>Even <a title="Hope he's got Apama shares yet..." href="http://streambase.com/about-links-markpalmer.htm" target="_blank">Mark Palmer</a>, who is usually extremely bullish about CEP and probably sprinkles it on his breakfast cereal, has recently admitted the <a title="A Brief Sojurn" href="http://streambase.typepad.com/streambase_stream_process/2010/11/how-broad-is-the-appeal-of-cep.html" target="_blank">CEP is only hot in Capital Markets</a>.  While I might disagree a bit with Mark, which is nothing new, I think we can all agree that CEP, at the $200M total market size is far less than we had all hoped for.  Frankly, it reminds me of the FIX engine vendor battles &#8211; I was an early provider there too &#8211; and we all ended up fighting over an ever shrinking market place.</p>
<p><strong>WHAT IS THAT MARKET ANYWAY?</strong></p>
<p>The current vendor set of CEP tends to focus on Capital Markets.  But not really.  It focuses on an even smaller slice of Capital Markets called High Frequency Trading.   Seems more people know more about High Frequency Trading today than CEP. The important thing here is the what all the smart analysts are calling the &#8220;CEP Market&#8221; really isn&#8217;t the &#8220;CEP Market&#8221; at all.  It&#8217;s the HFT software market.  And again, looking at the impressively long list of clients that Mr. Palmer has cited in his recent blog post, many of those firms aren&#8217;t actually using CEP for HFT, but an even smaller subset of functionality.  That&#8217;s why the market is so small &#8211; if any VC firm thought the total addressable market for this technology was going to be $200M in 2010, no CEP startup would have received funding.  And when HFT finds the Next Big Thing, the CEP market, as defined today, will evaporate.  And along with it, any CEP vendor who has concentrated solely upon that market.</p>
<p><strong>SO WHAT HAPPENED?</strong></p>
<p>The idea was that we were ushering in a New Way To Compute Things.  Like all technologists who spend way too much time thinking about this stuff, we thought everyone would immediately see how smart we were, run out and buy one of the CEP based products, and join is in revolutionizing how data is turned into information and used by business folk to make money and pay our salaries.  The only problem is, we forgot 2 things; 1) who would be using our software to do this work, and 2) who would subsequently be using the applications developed by 1.</p>
<p><strong>DEVELOPERS &#8211; A FINICKY BREED</strong></p>
<p>I used to be a Real Developer &#8211; I wrote in C++.  Then Sun decided that the Internet was the Computer and we all started to learn Java.  Java is cool &#8211; Java makes it easy for anyone to write bad code whereas C++ really took some effort to mess things up.  More and more people started using Java for everything; servers, clients, web stuff, etc.  And now, I&#8217;m not sure what people use anymore &#8211; perhaps coders are using NoJava for all of their no shiny NoSQL apps.  I still use Java.  And I&#8217;m loathe to learn another language.  See #4 above in &#8216;What&#8217;s CEP?&#8221;  I don&#8217;t want to learn another language.  And I certainly don&#8217;t want to move all of my work; servers, clients, webapps, etc. to a new and unproven language.  And no matter which vendor you don&#8217;t choose for your CEP application because you write it all yourself anyway, none of their languages can claim to be broadly or generally adopted.  Proof?  Try to buy a book on one of them.  There are umpteen books out on NoSQL in less time than it took some CEP vendors to go out of business.  CEP vendors have failed to appeal to core IT departments.  Period.  And core IT departments are the folks who have to assemble all the crap they buy from vendors into something that business users get to complain about when it doesn&#8217;t work.</p>
<p><strong>BUSINESS USERS &#8211; &#8220;SHOW ME SOMETHING!&#8221;</strong></p>
<p>Business users want to see information.  They want to see information presented crisply; ready for decision making.  And today, more than ever, they want to see it on their web browser, iPad, iPhone, Droid, Apple TV, disconnected lap top, on flat panels on the front of their refrigerator in the kitchen and on the heads up display in their car while commuting to work.  In short, they want information any time they want information so that they can function in what has become, and will continue to become, an ever faster and more connected world.  Even Progress Apama, who I think is doing really well, uses Flash based instrumentation.  No iPad for you!  There is no CEP environment that let&#8217;s the IT folks build a complete application for the business user.  So the business user never &#8216;SEE&#8217;S&#8221; CEP.  So they&#8217;re not impressed.  They don&#8217;t get it.  And they don&#8217;t provide budget for stuff they don&#8217;t get.</p>
<p><strong>AND IN CONCLUSION</strong></p>
<p>CEP has failed to achieve the multi-billion dollar market forecasts that we all went out and raised money based upon because most vendors have failed to provide the education and tools necessary to create the complete user experience.  Most of the CEP vendors don&#8217;t even have their own visualization products &#8211; they partner with other vendors to provide things like Tree Maps or Dash Boards.  How they expect to revolutionize the world by outsourcing their interaction with the Business User is beyond me.</p>
<p><strong>THE LESSON?</strong></p>
<p>If the NoSQL camp would like to come anywhere close to realizing the crack smoking analysts&#8217; estimates of a $2B market, they should 1) make the technology readily accessible to the IT department (which they&#8217;re doing) and 2) make sure that the business users knows why it&#8217;s making a difference.  If they can reach out and touch the business user, or consumer, all the better.</p>
<p><strong>AND THANKS FOR READING</strong></p>
<div id="tweetbutton843" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2010%2F11%2F20%2Fwrong-complex-event-processing%2F&amp;text=What%26%23039%3Bs%20Wrong%20With%20Complex%20Event%20Processing%3F&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2010%2F11%2F20%2Fwrong-complex-event-processing%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2010/11/20/wrong-complex-event-processing/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Flash Crash &#8211; HFT Not To Blame so What Next?</title>
		<link>http://blog.cloudeventprocessing.com/2010/10/27/flash-crash-hft-blame-next/</link>
		<comments>http://blog.cloudeventprocessing.com/2010/10/27/flash-crash-hft-blame-next/#comments</comments>
		<pubDate>Wed, 27 Oct 2010 14:12:11 +0000</pubDate>
		<dc:creator>colin</dc:creator>
				<category><![CDATA[Opinion]]></category>
		<category><![CDATA[RegNMS]]></category>
		<category><![CDATA[FlashCrash]]></category>

		<guid isPermaLink="false">http://cloudeventprocessing.com/?p=834</guid>
		<description><![CDATA[Much to the SEC&#8217;s consternation, the recent report detailing the causes of the May 6th, 2010 Flash Crash has failed to indict High Frequency Trading as the cause. BUT WE WANT THE MONEY How does all of this tie in &#8230; <a href="http://blog.cloudeventprocessing.com/2010/10/27/flash-crash-hft-blame-next/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Much to the SEC&#8217;s consternation, the recent report detailing the causes of the May 6th, 2010 Flash Crash has failed to indict High Frequency Trading as the cause.</p>
<p><strong>BUT WE WANT THE MONEY<br />
</strong><br />
How does all of this tie in with the SEC&#8217;s bid to build a huge consolidated audit trail? (CAT) &#8211; well, after a lot of thought, I don&#8217;t really know.  And I haven&#8217;t read a lot from anyone that purports to know either.  All I know is the SEC wants a lot of money to build it because, well, we need it to watch all of those evil HFT firms out there disrupting the market.</p>
<p><strong>REG NMS IS BROKEN</strong></p>
<p>So here&#8217;s a relevant question, if HFT wasn&#8217;t to blame for the Flash Crash, then what was?  I thought Reg NMS was supposed to ensure that, no matter how many exchanges enter the fray, we were all supposed to get the best price &#8211; regardless of where the order executed.  And that all of the exchanges would cooperate and there would be peace and harmony in the valley.  Guess what, that&#8217;s not working.  IT would seem that the &#8216;structure&#8217; of the market needs a little attention.</p>
<p><strong>IF IT&#8217;S BROKEN, WHAT ARE THEY REGULATING?</strong></p>
<p>So, if HFT isn&#8217;t to blame for the crash, and it boils down to some idiot shorting billions of dollars of futures in one fell swoop, then something else must be broken.  And the SEC&#8217;s CAT will fix it, right?  Wrong.  All the SEC&#8217;s CAT proposal will do once it&#8217;s live in 4 years after billions of dollars is yet again confirm that HFT isn&#8217;t responsible for any other flash crashes as well.  So why spend the money?</p>
<p><strong>BECAUSE I WANT AN EMPIRE</strong></p>
<p>Could it be, that the SEC, who didn&#8217;t even have the expertise required to answer the basic question, &#8220;WTF happened to the market please&#8221; could be looking to build an empire?  Is this the same organization who seems to dispense different levels of disciplinary action based upon how deep the offending firms pockets are?  Say it isn&#8217;t so!  Asking for billions of dollars represents a multiple over the SEC&#8217;s current operating budget.  Just what exactly do they plan on doing with all that money?  And how many firms have offered to build the capabilities for a lot less?  If the SEC doesn&#8217;t even know what the problem is, how do they know how much money to ask for and what system needs to be built?  Madness.</p>
<p><strong>I AM NOT IMPRESSED</strong></p>
<p>I&#8217;d really like to hear how the SEC&#8217;s going to fix the market&#8217;s structure and prevent this from happening again.  Then I&#8217;d like to see actual enforcement of the laws on the books and see people who violate these laws go to jail instead of write checks.  In short, what I&#8217;d like to see is the SEC start doing their job before asking for billions of dollars to do something that wouldn&#8217;t have prevented what happened in the first place.  Please.  Does the SEC and Mary Schapiro think we&#8217;re all just that stupid?</p>
<p><strong>THAT&#8217;S RIGHT, I SAID IT</strong></p>
<p>So not only am I not going to go on record with the above (go back and read it again &#8211; the SEC isn&#8217;t doing their job) but I&#8217;m always going to say that event processing technology can&#8217;t prevent future occurrences of Flash Crash (and CEP vendors who think they can really don&#8217;t know what they&#8217;re talking about).  Several vendors  (YOU KNOW WHO YOU ARE) are all too happy to <strong>not </strong>point out that the emperors new clothes, well, need some additional tailoring as they jump on the SEC and CFTC&#8217;s bandwagon, hoping for additional, regulation sourced revenue.</p>
<p><strong>THE SEC COULD BE A HERO</strong></p>
<p>I believe that if the SEC started doing their job again, and investor confidence returned to the equity markets, maybe we&#8217;d see a return to previous volumes.  Because right now, at the volume levels out there today, there&#8217;s going to be blood in the streets as firms collapse under the lack of trading.  All the SEC has to do is their job.  And be 1/10th as vocal about it as their request for billions of hard earned tax payers dollars.</p>
<p><em>Thanks for reading!</em></p>
<div id="tweetbutton834" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2010%2F10%2F27%2Fflash-crash-hft-blame-next%2F&amp;text=Flash%20Crash%20%26%238211%3B%20HFT%20Not%20To%20Blame%20so%20What%20Next%3F&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.cloudeventprocessing.com%2F2010%2F10%2F27%2Fflash-crash-hft-blame-next%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.cloudeventprocessing.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.cloudeventprocessing.com/2010/10/27/flash-crash-hft-blame-next/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: blog.cloudeventprocessing.com @ 2012-02-23 01:21:41 -->
