July 26, 2007

Semiotics and REST

I think the biggest confusion about REST is that it's not a protocol, it's a way to think about extremely big information systems. To compare prior models with REST, one has to think about the problem of information management & manipulation in network.

Traditionally, distributed systems saw data as globally consistent -- one used two-phase commit to ensure this consistency.

However, many organizations have applications with "copies" of data, or with their own independent database, and use replication or messaging to enable a level of partial consistency. With this approach, one can view the network as having "autonomous" services, each with its own independent view of information. The latter is more common in practice in most enterprises, it's the most scalable, and is also the view that SOA tends to take.

One of my favorite discussions of the implications of an "autonomous" model of information management is from Pat Helland. This idea, one I blogged about way back in late 2003, is a separation between "data on the outside" vs. "data on the inside", which he discussed at the Microsoft PDC and also captured in this article. "Data on the inside" is service-private data. No one can see it except the service itself, it is encapsulated. "Data on the outside" includes messages & reference data (where messages typically are the means of conveying reference data).

In this approach, information may be represented differently between service boundaries (e.g. Inside, with an RDBMS, for example, and outside, with an XML document).

But, here is the key point: there is a shared meaning, or concept behind both representations of the data, and the service implicitly has a 3-way "mapping" between the inside representation, the conceptual meaning of the information, and the outside representation.

This three-way relationship is also known as a semiotic relationship: between the symbol, an object, and the concept. Without this relationship, it's very hard to communicate ideas whose substance evolves over time with any precision or integrity, and arguably it's one of the cornerstones of information management theory.

To contrast the two models of REST and SOA:

In SOA, this "conceptual mapping" is implicit in the service boundary. Many such mappings may be conveyed through a service boundary. They are always there, but are usually tacit, or encoded in an application-specific manner.

In REST, this semiotic "mapping" between an information concept, the inside of a service & the data represented outside a service, is called a resource. And each resource is given one or more unique identifiers in a uniform syntax.

In SOA, the service contract is the key abstraction of an information system. It forces the information system into a model where everything is viewed as a shared agreement between one or more producers & consumer of messages.

In REST, the resource is the KEY abstraction of a global information system. One service = one resource. It forces the whole information system into an application model where all actions are generalized into uniform methods of sign (representation) exchange. And the representations themselves contain uniform links to other resources, ensuring that no out-of-band information is ever required to interact with the system -- connected resources, pulled and manipulated as desired, become the engine of any agent's desired ends.

The caveat:

Using REST for the problem-space that WS-* is intended to solve still requires a lot of work by industry. There aren't enough standards to make this as easy as it could be. Though the publication of Atompub, the burgeoning Microformats effort, etc., we're in a very good state.

The point of these debates, yet again:

To me, it is not that WS-* sucks, or that REST is a faddish religion. It is that vendors are not addressing fundamental problems in the application model that SOA derives from, i.e. a hybrid of component-based development, OO-RPC, and messaging-oriented middleware. It is bound to hit a wall of our own making, as currently practiced.

We've been trying one variant or another of this approach for 15+ years, and only recently have gotten reasonably good at it. We convinced ourselves that XML Infosets would solve the political and usability challenges. But even if we standardize transactions, and security, and reliability in XML infosets, we still do not have a very scalable, interoperable, or loosely coupled model for information systems -- because everyone will still be inventing their own!

The real problem lurking was that we, with SOA, weren't treating information as an asset: a resource that can evolve over time. Even if we knew that these resources existed, and should be managed with care, they were tacitly hidden in our IDL, schemas, and WSDLs, or in a "governance document" of some sort. We weren't enabling a low-barrier to entry to access those resources in our information systems. And we weren't connecting our services together into a web, where discovery was a natural act.

Yet the World Wide Web has effectively nailed a good chunk of these problems. We could re-invent the Web in XML -- but why? Couldn't we use it for its strengths, while integrating the WS-* technologies where they really add value in enhancing (instead of replacing!) the Web?

Posted by stu at July 26, 2007 07:53 PM