January 11, 2007

Life beyond distributed transactions

Pat Helland is one of my technology heroes. One of the leads of Tandem's TP monitor, and eventually Microsoft COM+, he knows transactions.

In the Microsoft PDC 2003's architecture symposium, I felt that Pat's talks were worth the price of admission on their own. He single handedly summarized why SOA was a good thing in practical, technical detail. He understood services, he understood their implications on data consistency, and it still is a testament to the dysfunction of our industry when we remain confused about SOA while Pat had it nailed back then and was communicating it in simple terms. I was so jazzed I even wrote an article back in early 2004 that was largely influenced by Pat Helland, fused with a bit of my own perspective and long-windedness.

Pat's overall theory was on the nature of data and interoperability at scale. One couldn't use distributed transactions at scale as it implied a level of trust one couldn't give in a multi-agent system (you don't hand your lock manager to a 3rd party in Taipei when you're in Brussels). He's had a number of metaphors for the same idea over the years: fortresses v. emissaries, service-agents vs. service-masters. Retrospectively, when viewed in context of Roy Fielding's work, this is clearly user-agent vs. origin-server.

In terms of "data elements", Pat suggested a distinction between resources v. activity data (and reference data transferred between them), and now, in this recent paper, entities vs. activity data. (link via Mark Baker, via Mark McKeown)

So, while the two Marks are suggesting Pat's reached REST the hard way, I would suggest this is something he's been saying for years, which is why I've never seen SOA at odds with REST. In 2003 , here was Microsoft's lead architecture guru suggesting all of this WS activity would culminate with this new architectural view of scalable interoperability. Then he left MS in late 2004, and people seemed to ignore him.

Anyway, in REST terms, reference data is representations, entity data is a resource (keyed by a resource identifier), and the set of representations as seen by a user agent is activity data. This latest paper seems to have added the importance of keys/identifiers for the entities (the resource identifier in REST or URI in HTTP).

Rather than being "REST the hard way", this is exactly the kind of paper that people in this debate need to see, understand, and debate. It talks about a topic that's often said to be a reason why HTTP is not enough, and why WS-* protocols are needed -- data consistency and reliable messaging. It also closes an implicit loop in REST when dealing with machine-to-machine interoperability -- origin servers can also be user agents, managing a set of known representations (activity data). That's the point of "hypermedia as the engine of application state". Which may be obvious if you've understood Roy's thesis for years, but it's less obvious to those that come from a distributed objects or transaction processing background.

Posted by stu at January 11, 2007 08:09 AM