Building a hypermedia-aware client is rather different from building a typical client in a client/server system. It may not be immediately intuitive. But, I believe the notions are rooted in (quite literally) decades of experience in other computing domains that are agent-oriented. Game behaviour engines, control systems, reactive or event-driven systems all have been developed with this programming approach in mind.
The normal way we build clients, in a client/server architecture, looks something like this:
![]()
The logic of the application - its objectives and how it wants to achieve them through one or more remove services, is often procedural. A rich OO domain model is sometimes preferred to procedural logic, but this isn't usually used in conjunction with remote services because of the latency involved; a service facade coalesces communication into coarse grained interactions.
This idea of a service facade culminates in SOA, where interfaces, along with all their possible message exchange patterns, are registered for others to lookup:
A agent-oriented client, on the other hand, looks something this (which I've adapted from Russell & Norvig's diagram):
The application agent has several pools of pre-defined logic:
a) Application Logic: some logic for the application itself (e.g. the basic states of a hypermedia application, the goals of the application if it has any. A browser has no goals other than rendering; whereas a product ordering & payment agent would have the goal of completing e-commerce transactions on behalf of a user)
b) Action Logic: some logic for the implications of actions (e.g. how does a payment & product ordering agent know, interpret, or infer that PUT/POST/DELETEing to a particular sequence of URIs will result in a paid product order)?
c) Protocol Logic: some built-in logic for handling protocols & media types (e.g. URI, MIME, HTTP, and maybe some mix of HTML, Atom, Atompub, etc.).
The problem of bridging together application and action logic together is known as Action Selection. Action selection doesn't require fancy algorithms. Its study has often dealt with complex subject matter, which has often lead to complex solutions. But in most agents, the bread and butter for action selection is simple: the Finite State Machine (FSM). An agent responds to changes in the environment based on its current state and a set of known transitions. There are other approaches to agent programming that are growing in popularity, like planners, but let's start with FSMs.
Firstly, an agent's application logic requires a state machine to describe the relationship between sensing ("safe") actions and changing ("unsafe") actions. In a hypermedia application it looks something like this:
This basic hypermedia application state machine is sandwiched hierarchically between the super-state machine for the application's goals and the sub-state machines for the protocols:
The trick with building a RESTful agent based on FSMs would be to figure out a way such that
a) The application's goals can be expressed in terms of hypermedia agent logic (e.g. sensing & effecting)
b) The hypermedia types and link relations themselves contains enough interpretable action logic that can be mapped to the application's domain
c) The action and protocol state machines are modular. RESTful applications tend to have standardized and relatively small number of generic protocols, so they need to be repurposed for different applications and/or contexts.
Two ways of accomplishing this include hierarchical FSMs and behaviour trees.
Hierarchical FSMs are popular in control systems and game engines. They are great for reactive systems, where the correct interpretation & response to input and events is the intent of the application. Managing call control, or a climate control system are examples. There are powerful generic Hierarchical FSM standards out there like SCXML that provide a code-on-demand approach to interpreting and managing states across a set of resources (though it probably could use some RESTful-friendly polish).
Behaviour trees have the same power as hierarchical FSMs, but tend to be more oriented towards goal-based applications, where the purpose of the application is to transition a bunch of resource state to some new state. For example, a calendar scheduling agent, or a payment & ordering agent, are examples of goal-oriented agents.
In future, I'm going to explore how to build a behaviour tree-based agent; probably for the Restbucks domain that Jim Webber, Savas Parastatidis, and Ian Robinson have been using for the past year or so and including in their "REST in Practice" book.
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=b1e32502-77d2-40d1-a5c3-ad5a276ae01c)

excellent.
Very Nice. For me, the client is the hard part with REST, but most content focuses on the server.
I have a bit of a problem with the notion of "Action Logic" though. If you think in terms of command and event messages (see http://bill-poole.blogspot.com/2008/04/avoid-command-messages.html ) then the client is sending command messages to the hypermedia/protocol processor which converts them into HTTP requests that have constraints that have been augmented beyond the minimal HTTP protocol constraints to include the constraints of the underlying action.
However, if you look at VoiceXML, CCXML and even HTML, the client emits event messages into the hypermedia processor. When translated to HTTP requests, there are no processing constraints carried over to the server. To me, this is critical in RESTful client-server decoupling and service evolvability.
AtomPub breaks this pattern, but the action constraints in vanilla AtomPub are barely more than the constraints of HTTP so the impact on decoupling/evolvability is minimized. But when folks try to extend Atom with their custom "actions" (usually via custom rels) they end up with all this coupling and then scratch their heads wondering where it came from.
As a rule of thumb I would suggest only sending events into the hypermedia processor, but if you must have "actions" they should map closely to HTTP.
Hi Andrew,
I'd like to understand more about what you mean by "processing constraints". Do you mean, the agent coupling against certain facets of a prior hypermedia representation that surrounds a link? Or do you mean, that the agent constructs a command message inside an overloaded POST based on a link relation? If the latter, I agree, that a POST representation shouldn't include command-like directives in it, as that increases coupling. (Though with simple states, like telling a lightbulb to "turn on", it's a bit grey as to whether that's a command or just state transfer ;-)
The point is that the hypertext or link relation should be able to describe the *data* pre and postconditions of a uniform action (e.g. POSTing to a URI), so this can be evaluated against the condition rules and/or goals of the agent.
The challenge is that the use of the unsafe methods (POST especially, but arguably also PUT and DELETE) to achieve an application goal (buying a book, turning on a lightbulb) always pushes some coupling into link relations or into media types if the agent is to have any ability to evaluate the utility of that POST for achieving its rule/goal.
None of this is a bad thing, IMO. Just that care must be taken that the actual POST interaction is still respecting a "state transfer" semantic, not being overloaded into a "command issuance" semantic. This is in part due to coupling, but also because after conducting the POST, an agent can't make any guaranteed assumptions as to what actually happened, it has to go back into the "Sensing" state to check if its cached representations have changed, or to sense any interesting links that came from the POST response.
So first, I tend to think of a RESTful client as having two key sub-components a "hypermedia processor" and a "client platform" -- mainly because the clients I've built (VoiceXML and CCXML mainly but not exclusively) were structured this way but also because I think it’s just a good way to think about the client. In this model the hypermedia processor receives the initial URI from the platform, retrieves, parses pages and maintains the application session state and interacts with the platform using a well defined interface. In the case of HTML, the browser might be the hypermedia processor and the OS, GDI or window manager might be the platform -- or maybe you can draw a line through the middle of the browser and say one half is the hypermedia processor and the other half is the platform for rendering, window and input mgmt etc.
Now by processing constraint I simply mean, well... constraints on the processing done by a component in certain situations. So as discussed in the link I provided earlier, an event message constrains the processing of the sender (the message is only sent if the receiver performs some well-defined processing) and a command message constrains the processing of the receiver. Note though that in situations where A communicates with B and B with C and D, then if a command is sent by A to B, the related processing might be performed by B or by some combination of B, C and D. So if A sends a "processOrder" command to B, then that order processing might involve C and D as well -- but the order gets processed.
A third type of message is a "document message" one that places no processing constraints on either the sender or the receiver. The data exchanged in the Unix shell-like pipe-and-filter style have this characteristic, which makes the model very versatile. I tend to think of the RESTful use of HTTP as a protocol for exchanging document messages -- so a GET is a command message to the server with processing constraints such as "send me back the associated reply", "side effects must be safe and idempotent" etc. and the reply is an event carrying a document message as payload. This model goes a long way towards client and server decoupling -- it allows a client to be a browser, spider or whatever because the server has no constraints on client processing. The uniform interface provides a generic and very minimal set of processing constraints on the various services allowing the services to be interchanged and evolved. This is all related to the "state transfer" semantics you talked about.
But if you don't design your hypermedia and clients properly (and the two are and should be closely related), this decoupling can quickly fall apart (or should that be fall together?). In VoiceXML, CCXML and HTML, the interaction between the "platform" and the "hypermedia processor" is defined in terms of commands from the hypermedia processor to the platform and events from the platform to the processor -- i.e. no processing constraints placed on the hypermedia processor and therefore no constraints *transferred* to the service. This is also why I use the term “platform” as a label – it is exposing an API used by the hypermedia processor.
But what I often see are hypermedia/relation and client designs that put constraints on the service. ie. A BuyBook rel in the markup and usually a similar function or message in the hypermedia processor implementation. With enough of these, you end up with an API of service methods defined as relations, the hypermedia processor exposes them as functions/commands to the platform (which at this point has stopped looking like a "platform" as it is using an API as opposed to exposing one) and is essentially acting as a proxy for the service.
I wonder if designing the client in terms of “actions” that need to be realized in terms of requests to a service can be done without over-constraining the service. How do you evolve a service to support new kinds of actions without also updating the hypermedia + rels and the client to support those actions?
Rather than mapping actions to requests, it seems to me that mapping events to requests is the better strategy. Rather than a BuyBook action, use a BookDesired event and a ProcessBookOffer command (from the hypermedia processor to the platform) and a corresponding acceptOffer/rejectOffer event from the platform. The difference is that with the BuyBook action, the service is constrained to processing book orders. With the platform event/command based approach, a service could process book orders, or simply be performing a survey of what books the clients are interested in, or doing something else I haven’t thought of.
Sorry for the long reply – I wanted to take a step back and explain from the ground up to make sure I was being clear. Hope it worked!
Your line of thought matches closely to what I have been developing in my brain over the last years, too. I view the client side 'program' as a dependency tree of goals ('select book' must be finished before 'buy book' can be persued) through witch the client is driven *by the service*.
However, I also found during coding exercises that the client side code essentially always comes down to handling failed client side expectations no matter how reactive or smart the client side is. In the end it repeatedly comes down to accepting that 4xx Client Error responses are part of the client-server contract and are not to be interpreted as 'broken contract'.
Instead, client owners should accept the fact that something goes wrong from time to time and make use of the application state provided as the body of any 4xx response. How useful that is depends largely on media type design and that is where I think most effort should be spent.
Anyway, very nice work.
Jan, I agree with you, the client side code needs to handle failed expectations, such as
- "WHOA I was expecting this media type but got another one, is there anything I can do with this new one?", or
- "Hey I got a 4xx, I must have done something wrong, let's try something else", or
- "a 5xx, Hmm.. Maybe I should retry later."
The trick, I think, for people building new kinds of user agents is to build up a fairly generic toolkit so that every developer doesn't have to reinvent the world for every user agent ;-)
Andrew,
So, I drew this to see if I get what you're describing.
Where the "well-defined interface" between the hypermedia processor and the platform is the arrow between the Controller and the Application State & Logic boxes. I used an MVC-style design to make it relatable to others, not that it's mandatory. I also suggested that the View can be several things, such as Windows, Icons, Menus, Pointers (WIMP), or could also be Web resources, whether rendered for humans (HTML/CSS/PNG) or other machine processors (Atom, Atompub, etc.).
With regards to events vs. documents vs. commands, I agree that it's important to "prefer documents over commands". But an event to me is still just a kind of document in REST, because REST itself is not an event-based architecture. Resources that represent event producers or sinks still need to communicate through the transfer of representation (i.e. documents).
I think there is a case here, however, where the requirements for some hypermedia agents may differ than those of VoiceXML, or a Browser. I think you're describing a reactive agent in those cases, which reacts to both the hypermedia and the client platform. That area is where most of our success lies in building user agents to date, IMO. But I'm thinking there is desire for a goal-directed agent -- which is an area with less success to date.
Goal-directed agents are really an example of integration. One of my problems with the field of EAI and SOA has long been a focus on automated process. Whereas so much activity is goal-driven rather than fairly rigid & process-driven. I see REST as a foundation for building goal-directed integration rather than being stuck in a process-centered mindset (though it certainly can do the latter too, even though you've increased your coupling beyond what the style suggests).
I do not believe you can get away from Actions in general when you're trying to build agents that are intended to *integrate* the intent of the service consumer with the different ways a service provider may expose their service. The use case example is, "a book buying agent that works across Barnes & Noble and Amazon".
When building goal-directed agents, the transfer of a document is an "Act" with some motivation and set of expectations. How does the agent know that this Act getting it closer to its goal? This is easy to understand on the "Sensing" side of a hypermedia application (it's safe). It's the "Effecting" part where you need to be careful, and seem to have three choices:
a. Bind to a dynamic document exchange interface ala. WADL or WSDL 2.0, or some
b. Restrict yourself to CRUD (e.g. Atompub-style PUT/DELETE or POST+201)
c. Describe the expected contract of the hyperlink in the hypertext or link relation and let the agent evaluate.
CRUD-style services have their place, but can place a burden on the client that defeats the point of client/server architecture (simplifying the client!) And WADL-style services also have their place (heresy!), at the cost of increased coupling. That's life, sometimes. But, clearly, (c) is what we need to develop.
The tradeoff, of course, is that an agent looking to Effect things can't just rely on the HTTP semantics, it has to rely on contracts described in link relations and hypertext, and transfer documents while not violating the HTTP semantics. Now, how do you make hypertext and link relations evolvable? I have some ideas & that's where I intend to dig….
On the diagram -- not quite. I would put a lot of the "brains" that you have in the hypermedia processor into the client platform. I see the hypermedia processor as quite dumb -- it has no "intent" it just "executes" the markup and scripts, stores some of the related session state (though not all some state can be stored in the client platform), and interacts with the platform for everything else.
Interaction for HTML is something like
CP->HP: navigate to this URI (initial URI)
HP->CP: navigating to URI (informative)
HP->CP: loaded and here are some commands to exec
(e.g. paint screen this way)
CP->HP: mouse click at X,Y (HP figures out this is a clicked link and follows link)
HP->CP: navigating to URI (informative)
HP->CP: loaded and here are some commands to exec
and so on
(Note there are exceptions to the event/command breakdown I described earlier -- the key is that processing constraints imposed on the HP never carry over to the service. The processing constraints of the HP on the service are only those defined by the uniform interface.)
Carrying this over to other domains, I would keep the HP simple. Any complex processing to go into the CP... but that's based on the model I'm generalizing from HTML, VoiceXML and CCXML. A different, non-reactive model might have a different breakdown.
Agree on events translated into document transfer -- thats a key part of the model: the HP translates events raised by the CP (mouse click, keypress) into state transfer -- it's also many to one: a performance advantage of REST over say telnet for a UI.
I am definitely proposing "reactive" agents over "goal-driven" -- I keep stressing the point because nearly 100% of the REST developers I encounter "in the wild" have made a mental jump to goal-driven agents without even considering using a "reactive" model to solve their problems. Reactive agents are proven and so trying that approach first makes the most sense to me. But I don't see anyone trying it at all.
I often hear "reactive" agents equated with "user interfaces". i.e. developers assuming that their machine-to-machine scenario rules out the reactive model. But reactive agents are not restricted to user interfaces -- CCXML is proof of this.
I also think that hypermedia design is tied to the agent design. i.e. there is a difference between "Reactive" and "goal-driven"/"action-driven" hypermedia. ie. HTML, VoiceXML and CCXML have a built-in concept of "events" from an underlying platform.
Atom is arguably "Action-driven" hypermedia and I suspect that many of the client-server decoupling problems that folks have with using Atom (beyond publishing) are related to this. I have a hard time seeing how the "actions" the client is trying to effect don't constrain server behavior to a restricted set of behaviors defined in the media type and baked into the client.
A "reactive" media type instead places constraints on client behaviors. And, just as we we see in the evolution of HTML, the client can't evolve without evolving the media type. Also servicing multiple types of clients (e.g. an HTML browser and a VoiceXML browser) requires supporting multiple media types - but this is the point of conneg IMO.
I don't know if its a typo but you close with the question of how to make hypertext and link relations evolvable to expand the the contracts that can be expressed beyond HTTP semantics. To me, this feels like the wrong goal -- with a reactive agent I can evolve a service while keeping the hypertext and relations constant and using the uniform interface. If you can do that with action-driven agents I would be impressed -- and maybe you can and I just don't see it (years of doing the reactive thing I guess).
Anyway, I'll stop there -- I'm eager to hear your ideas (coming in Part 2 I guess), and the more of your time I spend on these comments, the longer I'll have to wait for them!
I'll leave this for future discussion, with two minor points
a. Yes, the reactive thing is worth considering in greater detail. I have a hunch as to why people jump to goal-driven, it has to do with the application domain(s) people work on.
b. It's not about breaking the uniform HTTP semantics, it's about describing the expectations based on the link relation or hypertext. Basically I want to pick up the stuff Mark Baker was talking about 8 years ago. In that example, he's using a basic RDF Schema axiom to provide inheritance semantics to link relations. In practice, having worked on this for a while, I'm don't think it's as easy has he illustrates, but in general it's the right track.
Stu, great that you bring up Mark's work! I have not worked with RDF since 2003 but looking at his subclassing approach with fresh eyes is quite exciting. However - mentioning RDF in standard enterprise projects is like asking for being kicked out right away :-)
Anyway - when it comes to describing resources there is really nothing more suitable than RDF (what a surprise :-). Maybe it is time to investigate uses of RDF again (RDF Forms comes to mind, too).
There's a bunch of antipathy out there towards RDF (from rather influential & respected people too, disappointingly), but I've also found a fair amount of open-mindedness in the enterprise, so long as you don't force them to understand the intricacies of OWL.
A small dose of semantics: inheritance, equivalence, disjointedness, domain, range... might be enough to make link relations fairly powerful. Maybe you need to create a less-politically-burdened language and syntax to make that happen (call it "FDR" and "BAT" and use batman as a mascot ;)
Hmmm been thinking about the error handling you and Jan were discussing earlier... I wonder if you are essentially removing the receiver processing constraints from the actions/command messages to the point where they are more like events.
i.e. "BuyBook" becomes "Hey I'd really like to buy a book now if that's ok with you... but if not that's fine" which is saying more about the sender than the receiver.