Tuesday, November 11, 2008

State, events, time - a view on the confusion around CEP

CEP

Over the recent few months, the EDM world I work in has seen a lot of noise generated by the arrival of CEP - "Complex Event Processing" - and the impact it has had in terms of provoking soul searching in the BRMS and, to a lesser extent the EDA, ESP and other E(x)(x) worlds.
The "E" in these E(x)(x) is "events" which are of course at the core of CEP.
But so is "complex" and "processing", both of which lead more to the area of EDM or BRMS.

This is one of these situations in which a technology addressing at its core a valid set of concerns gets dropped in the middle of a complex soup of acronyms, and confusion ensues.

Of course, the CEP specialists should bear with me for the duration of the blog. I am aware that CEP has been around for a long time, etc., but it's also clear that it is undergoing a renewal through its adoption by the big platform vendors (IBM and Oracle acquisitions) and the innovative ones (Tibco, JBoss).

Why am I writing this blog? Essentially because I believe that the confusion is self-inflicted because as an industry (enterprise applications) we have not been careful to focus the usage of the terminology to its key area - events - and we've spent too much energy trying to justify the complex and processing aspect.

While some of the points below touch on semantic confusion (big words), I will try to remain pragmatic. Let's take "event", "complex processing" in turn.

"Events"

It is true that "events" and their semantics are quite important in systems that need to make decisions.

This is not new, it has always been the case. So why are we - the enterprise software world - only recently starting to give to "events" the preeminence they already have in other worlds - such as the real-time systems world I started in?

Well, simply because the enterprise world has a way to cope with a lot of the value of events by translating events into state.
Traditionally, the occurrence of events, and in sophisticated systems, even their sequencing and timing, are encoded in the state of the system, transforming the "event" management problem into the dual "state" management problem.
Take an example: a fraud detection system is of course keenly interested in knowing what happened where when in which order. It's not a surprise that most CEP vendors use fraud detection as a key example. Well, guess what? Fair Isaac's Falcon Fraud Manager, by far the most used credit card transaction fraud detection system, as well as a host of other fraud detection systems, do not use CEP as defined by the current vendors. They translate events into profile (read "state") information, and they enrich at run time the profiles using sophisticated business rules (not even production systems). This profile encodes variable values that are highly targeted precisely to capture the essence of the business semantics of the events: "number of purchases of gas using the same credit card charging less than a certain amount within a window of time and a geographic region".
You could say that what they have is a sophisticated hugely scalable (90%+ of all credit card transactions in the US go through Falcon) stateful decision management system, supported by a powerful "cache" of variables computed through complex business rules.
And there are many cases like that.
The reality is that the "enterprise" world has been dealing with events for ever. They have simply not needed to resort to any E(x)(x) notion / stack / etc.

That being said, I believe there is an important piece in "event" centric expression of logic - and that is the separation of the event processing logic from the rest, and the resulting clarity, elegance, maintainability and ultimately all those scalability, ability to audit, robustness, ..., qualities that result from clean concepts, clean designs, clean architectures.

Which brings me to the following first opinion on CEP:
(1) CEP should stop worrying about the origin of the events and focus on the events themselves. It does not matter how the events originate, and the issues of ESB, EDA, ESP, in-database generation, etc..., are all orthogonal / independent on how events-dependant logic is managed.
And to the second opinion on CEP:
(2) CEP should stop worrying about caching. Yes, caching is important but irrelevant to the power of the approach - as the fact that among the largest and most scalable event-driven enterprise apps, many handle the issue with no need to couple the event management piece from the management of caches. Right now, there are efforts to extend the Rete structures and to adapt the algorithm to build this cache in a more efficient way for the corresponding type of rules processing. Great usage of the technology, but that will not make the potential power of event centric approaches any more compelling.

Maybe it's time to be a little more constructive.

(3) CEP would do wonders in enterprise systems if it focused all its attention to the "event" part: the semantics of events, the richness of the event context (time, location, referential, ...), the clean semantics of the key contextual notions (time operations including referential management, etc...), etc.

The events semantics question is not innocent, and is absolutely not a settled question - witness the numerous exchanges between clever people than me on this subject.

I had a discussion with Paul Haley once on this subject and we went into the "what is an event" question that has the virtue of quickly getting people upset. It's a valid question: the definition of "event = observed state transition" has the bad taste of defining events in terms of state, but its key issue in my eyes is that it supposes observation of a state and lacks content.
The value of events is that they have intrinsic context that are not naturally contained in state systems: they occur at a point in time with respect to a given referential - or more generally, they occur at a contextual point (could be time+location, etc.) with respect to a given referential. Different events used within the same system may have their intrinsic context expressed with respect to different referentials - and that will be the default case in any distributed system.
Events occur, they are atomic, immutable. Their only existence is their occurrence. We may track the fact that they occurred, but an event instance only happens once and is instantaneous. An event does not last a duration. That is not logical to me - its effects or the state change it triggers may last a duration, but the event in itself is instantaneous.
Which leads to the fact that there are natural event correlations you want to express (not just discover): an event creates a state transition, a correlated event will create another state transition that will bring the state back to the original one.
This is just my opinion - but if you talk to more than one real specialist, you will get more than one view. Not a good sign of maturity of the concepts.

Clean ontologies / semantics / etc. needed.

With the clarification of the semantics of events and their intrinsic referential-dependent context are clarified, we need to focus on what we want to express about events - and for that, we need to bring on the enterprise application experts.
There is a lot to learn from the real-time systems experts - refer to the very old but incredibly good insights from Parnas' work. There is a lot to get back from the original event correlation systems - many built with systems such as Ilog's rules engine. These could even be said to be the purest predecessors to what CEP attempts to do.

What this will end up doing is giving us - the decision management world - a very powerful tool to "naturally" express logic on events, with their referential-dependant context, and to do so in a way that enables true management (things like verification of the logic included), powerful optimizations, etc.


I honestly do not think we are there, and I would really like to see the standardization world - the OMGs and others - help us get there; but I do think we need the enterprise business app drivers. We had that, in real-time systems: the military and transportation apps.

I would love the specialists to prove me wrong and to show me we are there.


"Complex processing"

This will be shorter.

As stated above, I believe that:
- CEP should narrow its processing ambitions. One approach is to focus the purpose of its processing to clear outcomes - such as what the correlation engines did 20 years ago. For example, we could say that CEP's processing is about processing / correlating events to generate higher order events: transaction events to generate a potentially fraudulent transaction event. I will call these "ambient events" and "business events": the CEP processing goal is to translate ambient events into business events
- CEP should focus the complexity of its processing to the corresponding revised ambitions.
- CEP should leave all issues related to event streaming, event transport, event communication, etc. to other layers.

I may be a purist, but I see a simple picture:
- leave anything related to transport, communication to other layers
- use this revised CEP to express and execute event-relevant logic, the purpose of which is to translate the ambient events into relevant business events
- have these business events trigger business processes (however lightweight you want to make them)
- have these business processes invoke decision services implemented through decision management to decide what they should be doing at every step
- have the business processes invoke action services to execute the actions decided by the decision services
- all the while generating business events or ambient events
- etc.

As such, CEP will include a semantically sound event-with-intrinsic-referential-dependent-context model, a corresponding language (EPL or vocabulary) to express logic, algorithms to efficiently execute (wide open field - tons of people doing analytics, Bayesian, rules, ...), techniques to verify (wide open - and fairly empty: I only know of the real-time folks), etc.

And there, the value of CEP is clear. Of course, it is lower than what CEP vendors would like, but significant anyway.

I am hoping this is controversial enough to bring on flames...

2 comments:

Eastwood said...

I'm a very pragmatic sort of guy so I like what you lay out here. At this point there are clearer distinctions between BRMS and BPM and we need them for CEP as well. There is clearly a space to play for CEP. As a former telecom guy the alarm filtering and correlation use case works well for me. Once one of these higher-order events is created (sorry, forgot what you just called them), then a decision service might be appropriate to determine an action or further diagnosis.

michaelvk said...
This comment has been removed by the author.