June 10, 2007

Data-centric architecture

This is also based on a recent post on the Yahoo! SOA mailing list, modified somewhat.

One complaint about RESTful approaches to software architecture is that it's a difficult investment to start looking at a legacy in terms of "Resources". Many transactional interfaces already look like services or components, so a shift to WS-* style SOA tends to be easier to adopt.

I see an large amounts of work undertaken to "SOA enable" one's transactional systems into more business-relevant services, using every manner of infrastructure (BPM, ESB, Data Services, etc.). Usually this is part of a larger initiative (as "SOA for its own sake" tends to be a very hard sell).

The problem is that, in my experience, shifting an IT department's mindset towards SOA tends to require a lot of architectural change. Many transactional interfaces are at the wrong granularity. Or have disjoint, overlapping semantics with other systems that evolved independently, but now require integration. It's mixed as to how an organization may accomplish this:

  1. Some are throwing out their old applications and buying packages like SAP (which want to SOA-your-world). This is often $100m+ of work.
  2. Others are rebuiliding their systems on Java or .NET , perhaps with some best-of-breed packages to fill in some areas. Again, this can may many $m.
  3. Many are just layering service infrastructure on top of the old stuff but doing a big rethink as to how re-route access through the new layer. Fewer $m, but still significant.

I don't think the issue is a lack of desire for investment in new infrastructure and in re-thinking. That's happening with SOA, to some degree. I think the reason for this disconnect is probably more fundamental, and seems to lie with the education and values of IT architects, similar to the eternal pendulum debates of behaviour-centric vs. data-centric design.

Here is my take on the disconnect:

1. REST approaches are data-centric. It isolates the importance of data -- identifiers, provenance, temporal relevance -- and singles them out as some of the most important aspects of a shared information system architecture.

Anyone that has dealt with data quality, data warehousing, etc. knows that this is a huge problem, but is often ignored outside of small circles in the enterprise. Perhaps this is why so much integration is still accomplished through ETL and batch transfer -- they're the ones that pay attention to the semantics of data & integrity of the identifiers ;-)

Roy, in his thesis, even underlines this in Chapter 1, noting that the vast majority of software architecture -- even in the academic community! -- ignores studying the nature of data elements. His conclusion -- "It is impossible to evaluate [a network-based application] architecture without considering data elements at the architectural level."

COM, CORBA, WS-*, MOM, etc. look at the data elements as messages. They are envelopes, like IP. They don't consider data elements beyond this: send whatever you data want, deal with data issues your way.

REST, on the other hand, looks at this explicitly, even covering data stewardship -- ("Cool URI's don't change", and "The naming authority that assigned the resource identifier, making it possible to reference the resource, is responsible for maintaining the semantic validity of the mapping over time.")

The bright side is that these differences don't preclude COM, CORBA, WS-* from adopting constraints that explicitly deal with data services.

2. SOAP Web Services were originally created to be an XML-oriented replacement for COM, CORBA, and RMI/EJB. This is documented history.

They were intended to:

a. simplify integration, and solve the problems of these old approaches -- make them more MOM-like and asynchronous, and less RPC-focused.

b. also allow richer data structures through XML (vs. the old approaches that required custom marshalling or proprietary serialization).

c. give a chance for Microsoft to get "back in the game" of enterprise systems, as J2EE had pretty muched eclipsed DNA. They would do this by eliminating the competition over programming models & core protocols - changing their old Microsoft-centric stance.

d. traverse firewalls by piggybacking on HTTP

The focus was clearly on XML as a marshaling format. The hidden assumption seems to be that if we fix the above, the "distributed object nirvana" that we longed for from the COM / CORBA days would take hold. SOA added "governance" to this mix. While SOA governance may deal with data problems in isolated cases, there is little consistent *architectural* treatment of data in these aproaches. It's still a mishmash of CBD, object-orientation, and message architecture.

Some articles to read....
September 1999: Lessons from the Component Wars, an XML Manifesto

April 2001: A Brief History of SOAP

Interesting quotes:

  • "SOAP's original intent was fairly modest: to codify how to send transient XML documents to trigger operations or responses on remote hosts"
  • "Component technology has been the cause of many arguments, disagreements, and debates. This component-induced friction can be traced to two primary factors:

    1. Different organizations and corporations want to be the de facto provider of component infrastructure.
    2. Component technology provides more opportunities for different programming cultures to interact.

    There are many lessons to be learned from examining these two factors closely. In this article, we will examine how component technology has evolved to XML."


(As an interesting aside: Both of these articles are by Microsoft's Don Box, though I think he was at DevelopMentor at the time. I think Pat Helland is one of the premier minds behind SOA. Microsoft is responsible for many, if not most, of the protocols we base WS-* style SOA implementations on. Yet, I find it fascinating that many of the SOA industry analysts, vendors, and some customers seem to treat Microsoft as an almost non-player, since they don't ship an ESB, rarely talk about SOA in the abstract, and don't cater to business consultants. )

Today -- SOAP 1.2 and WS-* have evolved this purpose into a general purpose asynchronous protocol, it really is still a way to create a vendor-independent, interoperable replacement for MOM.

This is not to say there is no value in a better MOM -- just that there might also be a lot of value in a better way to integrate data in a distributed system. Which is why I find RESTful archtiectures exciting.

Posted by stu at June 10, 2007 03:17 PM