Background
A couple of weeks ago, the second CTP for SQL Server 2008 R2 was
released. I downloaded it immediately. Still the product is far from
finished, and the CTP didn't contain any preview of the "Gemini"
component, but I could at least test the StreamInsight part. The
finished product is scheduled to be released during the first half of
year 2010, according to Microsoft's web site.
[Notice: This article has been updated for the November CTP of StreamInsight]
What's new in SQL Server 2008 R2? I would say the most important news are:
- StreamInsight
- Managed Self Service Business Intelligence (the so called "Gemini" project)
- Application and Multi-Server Management
This blog post will focus on StreamInsight.
StreamInsight
What is StreamInsight and what is it good for? StreamInsight is a
platform for developing and deploying applications that handle
high-speed streaming data.
It could be used for near real-time processing of data from production
environments, structured data such as financial information and
unstructured data such as Facebook, Twitter and blogs. Multiple sources
can be combined and refined before they are being output. This
technology is called
CEP - complex event processing.
Combine StreamInsight with data mining, and you can have real-time
fraud detection, stock market prediction, you name it... Imagine a
real-time Business Intelligence-application where the management can
actually see the KPI gauges moving on the screen. I would say that
monitoring real-time flow of information is one of the key success factors for tomorrow's Business Intelligence.
This is also supported by Ralph Kimball, saying in an interview that
Business Intelligence is moving from the strategic level towards
operational level processes such as customer support, logistics and
sales. At the operational level data must be continuously updated, not
just once per day. I would add also that the new generation, that has
grown up with Facebook and Twitter, will make it necessary to monitor
new sources for successful Business Intelligence.
How does it work?
You develop your application in Visual Studio 2008, using libraries
from StreamInsight. Basically you develop (or use existing) input and
output adapters and connect them through LINQ queries. The picture below
(copied from Microsoft's StreamInsight documentation) shows a good
overview of the product:
(Click on image to enlarge)
There are three different development models supported by StreamInsight:
- Explicit model, which provides full control and access to the CEP-server.
- Implicit model, which can handle only a single query and runs all in memory. Good for debugging.
- IObservable/IObserver interface, which hides much of the complexity of the CEP-server API
All data in StreamInsight is organized into event streams. The events are of two kinds:
- INSERT events, that contains a header with timestamp information and
a payload with application specific data. These events can be of
different models, such as Interval, Point and Edge, to support different
temporal characteristics.
- CTI events, that indicates the completeness of previous events in the stream.
Adapters can be of two types:
- Untyped adapters that work with multiple types of INSERT events
- Typed adapters that work with a single type of INSERT events
Queries are written in LINQ with the possibility to do calculations,
filtering, grouping, windowing, aggregations, combining and invoke
user-defined functions. A query template is bound to specific input and
output adapters to produce a query instance.
An example - building an input adapter for Yahoo Finance stock quotes
For testing StreamInsight, I chose to build an input adapter that
reads stock quotes from Yahoo Finance. Note however that quotes from
Yahoo Finance are delayed at least 15 minutes unless you pay for their
premium service. To test the input adapter I also built an output
adapter the displays the quotes on the screen (just a simple console
application). These I joined together in an application that I built
using the implicit development model.
The point here is not the functionality of the application, but that
you could easily replace the output adapter with something much more
useful. By adding SQL Server 2008 Data Mining, using any algorithm for
predictive analysis, you could build something much more useful. Adding
an output adapter that places buy and sell orders into a trading system
could make you rich. Maybe I will write about that in a later blog,
maybe not...
The picture below shows the Visual Studio 2008 project.
(Click on image to enlarge)
ScreenScraper is a helper class that reads the HTML contents
of a web page and extracts a string using pattern matching (a regular
expression).
YahooFinance is a helper class that provides the
ScreenScraper with suitable URL and pattern matching expression for the
desired stock symbol.
StockPayload contains the actual data and is defined as below:
public class StockPayload
{
///
/// Unique ID of stock or index
///
public string StockID { get; set; }
///
/// Current value of stock or index
///
public double Value { get; set; }
}
The data members can only be basic types such as numbers, strings,
etc. This is probably because the serialization mechanism in
StreamInsight requires a fixed size payload.
The initial rows of the
input adapter is shown below.
(Click on image to enlarge)
Note that the input adapter inherits generic class TypedPointInputAdapter, which is used for
typed (single kind of event)
point (point in time events) adapters. Basically the adapter has to implement methods for
Start() and
Resume(), monitor the
AdapterState and
enqueue
events to the output stream. The input adapter can work with
asynchronous sources through callback functions, and synchronous
sources. I've implemented the input adapter as a loop that reads data
from Yahoo Finance and then waits for a determined time. A CTI event is
sent after each quote, to indicate there are no other events awaiting.
A
factory class is provided to instantiate the adapters. The factory implements a
Create method that takes a configuration and an event shape as parameter. Based on these parameters it instantiates the input adapter.
The
output adapter is built similarly as the input adapter, so
I don't describe it here. The adapters are tied together by a query
defined by the following statements. Note that this solution is specific
for the
implicit development model. In the explicit model, the query would instead be bound through the CreateQuery method of the application.
// Join input adapters with a simple LINQ query
var combinedInputStream = (from e in input1Stream
select e).Union
(from e in input2Stream
select e);
// Connect input adapters with output adapter
var query = combinedInputStream.ToQuery(outputSink);
Finally the query is being run:
// Run the query
query.Start();
adapterStopSignal.WaitOne();
query.Stop();
Testing and debugging
SQL Server 2008 R2 StreamInsight comes with a tool,
StreamInsight Debugger,
that can visualize a recorded event flow. To use it you first need to
write tracing data to a file. Enable tracing, run your application, and
the disable tracing. This is done from command-line as shown below:
Opening the trace data file, sample.etl, in StreamInsight Debugger
you can monitor the stream pipeline. The picture below shows data from
the input adapter instances and how they are unioned in the LINQ query.