Building a RESTful Hypermedia Agent, Part 1

| 12 Comments | 2 TrackBacks

Building a hypermedia-aware client is rather different from building a typical client in a client/server system. It may not be immediately intuitive. But, I believe the notions are rooted in (quite literally) decades of experience in other computing domains that are agent-oriented. Game behaviour engines, control systems, reactive or event-driven systems all have been developed with this programming approach in mind.

The normal way we build clients, in a client/server architecture, looks something like this:
cs-programming-model.png


The logic of the application - its objectives and how it wants to achieve them through one or more remove services, is often procedural. A rich OO domain model is sometimes preferred to procedural logic, but this isn't usually used in conjunction with remote services because of the latency involved; a service facade coalesces communication into coarse grained interactions.

This idea of a service facade culminates in SOA, where interfaces, along with all their possible message exchange patterns, are registered for others to lookup:

soa-programming-model.png

A agent-oriented client, on the other hand, looks something this (which I've adapted from Russell & Norvig's diagram):

agent-model.png

The application agent has several pools of pre-defined logic:

a) Application Logic: some logic for the application itself (e.g. the basic states of a hypermedia application, the goals of the application if it has any. A browser has no goals other than rendering; whereas a product ordering & payment agent would have the goal of completing e-commerce transactions on behalf of a user)

b) Action Logic: some logic for the implications of actions (e.g. how does a payment & product ordering agent know, interpret, or infer that PUT/POST/DELETEing to a particular sequence of URIs will result in a paid product order)?

c) Protocol Logic: some built-in logic for handling protocols & media types (e.g. URI, MIME, HTTP, and maybe some mix of HTML, Atom, Atompub, etc.).

The problem of bridging together application and action logic together is known as Action Selection. Action selection doesn't require fancy algorithms. Its study has often dealt with complex subject matter, which has often lead to complex solutions. But in most agents, the bread and butter for action selection is simple: the Finite State Machine (FSM). An agent responds to changes in the environment based on its current state and a set of known transitions. There are other approaches to agent programming that are growing in popularity, like planners, but let's start with FSMs.

Firstly, an agent's application logic requires a state machine to describe the relationship between sensing ("safe") actions and changing ("unsafe") actions. In a hypermedia application it looks something like this:

state-machines.png

This basic hypermedia application state machine is sandwiched hierarchically between the super-state machine for the application's goals and the sub-state machines for the protocols:

state-sandwich.png


The trick with building a RESTful agent based on FSMs would be to figure out a way such that

a) The application's goals can be expressed in terms of hypermedia agent logic (e.g. sensing & effecting)

b) The hypermedia types and link relations themselves contains enough interpretable action logic that can be mapped to the application's domain

c) The action and protocol state machines are modular. RESTful applications tend to have standardized and relatively small number of generic protocols, so they need to be repurposed for different applications and/or contexts.

Two ways of accomplishing this include hierarchical FSMs and behaviour trees.

Hierarchical FSMs are popular in control systems and game engines. They are great for reactive systems, where the correct interpretation & response to input and events is the intent of the application. Managing call control, or a climate control system are examples. There are powerful generic Hierarchical FSM standards out there like SCXML that provide a code-on-demand approach to interpreting and managing states across a set of resources (though it probably could use some RESTful-friendly polish).

Behaviour trees have the same power as hierarchical FSMs, but tend to be more oriented towards goal-based applications, where the purpose of the application is to transition a bunch of resource state to some new state. For example, a calendar scheduling agent, or a payment & ordering agent, are examples of goal-oriented agents.

In future, I'm going to explore how to build a behaviour tree-based agent; probably for the Restbucks domain that Jim Webber, Savas Parastatidis, and Ian Robinson have been using for the past year or so and including in their "REST in Practice" book.

Reblog this post [with Zemanta]

2 TrackBacks

TrackBack URL: http://www.stucharlton.com/blog/mt-tb.cgi/140

Stuart Charlton says stuff about Building a RESTful Hypermedia Agent, Part 1 : Building a hypermedia Read More

Grading REST from Community Mashup on October 22, 2010 12:00 PM

Earlier this week, Raj Singh graded the OGC's services against REST and then graded the REST architectural Read More

12 Comments

Very Nice. For me, the client is the hard part with REST, but most content focuses on the server.

I have a bit of a problem with the notion of "Action Logic" though. If you think in terms of command and event messages (see http://bill-poole.blogspot.com/2008/04/avoid-command-messages.html ) then the client is sending command messages to the hypermedia/protocol processor which converts them into HTTP requests that have constraints that have been augmented beyond the minimal HTTP protocol constraints to include the constraints of the underlying action.

However, if you look at VoiceXML, CCXML and even HTML, the client emits event messages into the hypermedia processor. When translated to HTTP requests, there are no processing constraints carried over to the server. To me, this is critical in RESTful client-server decoupling and service evolvability.

AtomPub breaks this pattern, but the action constraints in vanilla AtomPub are barely more than the constraints of HTTP so the impact on decoupling/evolvability is minimized. But when folks try to extend Atom with their custom "actions" (usually via custom rels) they end up with all this coupling and then scratch their heads wondering where it came from.

As a rule of thumb I would suggest only sending events into the hypermedia processor, but if you must have "actions" they should map closely to HTTP.

So first, I tend to think of a RESTful client as having two key sub-components a "hypermedia processor" and a "client platform" -- mainly because the clients I've built (VoiceXML and CCXML mainly but not exclusively) were structured this way but also because I think it’s just a good way to think about the client. In this model the hypermedia processor receives the initial URI from the platform, retrieves, parses pages and maintains the application session state and interacts with the platform using a well defined interface. In the case of HTML, the browser might be the hypermedia processor and the OS, GDI or window manager might be the platform -- or maybe you can draw a line through the middle of the browser and say one half is the hypermedia processor and the other half is the platform for rendering, window and input mgmt etc.

Now by processing constraint I simply mean, well... constraints on the processing done by a component in certain situations. So as discussed in the link I provided earlier, an event message constrains the processing of the sender (the message is only sent if the receiver performs some well-defined processing) and a command message constrains the processing of the receiver. Note though that in situations where A communicates with B and B with C and D, then if a command is sent by A to B, the related processing might be performed by B or by some combination of B, C and D. So if A sends a "processOrder" command to B, then that order processing might involve C and D as well -- but the order gets processed.

A third type of message is a "document message" one that places no processing constraints on either the sender or the receiver. The data exchanged in the Unix shell-like pipe-and-filter style have this characteristic, which makes the model very versatile. I tend to think of the RESTful use of HTTP as a protocol for exchanging document messages -- so a GET is a command message to the server with processing constraints such as "send me back the associated reply", "side effects must be safe and idempotent" etc. and the reply is an event carrying a document message as payload. This model goes a long way towards client and server decoupling -- it allows a client to be a browser, spider or whatever because the server has no constraints on client processing. The uniform interface provides a generic and very minimal set of processing constraints on the various services allowing the services to be interchanged and evolved. This is all related to the "state transfer" semantics you talked about.

But if you don't design your hypermedia and clients properly (and the two are and should be closely related), this decoupling can quickly fall apart (or should that be fall together?). In VoiceXML, CCXML and HTML, the interaction between the "platform" and the "hypermedia processor" is defined in terms of commands from the hypermedia processor to the platform and events from the platform to the processor -- i.e. no processing constraints placed on the hypermedia processor and therefore no constraints *transferred* to the service. This is also why I use the term “platform” as a label – it is exposing an API used by the hypermedia processor.

But what I often see are hypermedia/relation and client designs that put constraints on the service. ie. A BuyBook rel in the markup and usually a similar function or message in the hypermedia processor implementation. With enough of these, you end up with an API of service methods defined as relations, the hypermedia processor exposes them as functions/commands to the platform (which at this point has stopped looking like a "platform" as it is using an API as opposed to exposing one) and is essentially acting as a proxy for the service.

I wonder if designing the client in terms of “actions” that need to be realized in terms of requests to a service can be done without over-constraining the service. How do you evolve a service to support new kinds of actions without also updating the hypermedia + rels and the client to support those actions?

Rather than mapping actions to requests, it seems to me that mapping events to requests is the better strategy. Rather than a BuyBook action, use a BookDesired event and a ProcessBookOffer command (from the hypermedia processor to the platform) and a corresponding acceptOffer/rejectOffer event from the platform. The difference is that with the BuyBook action, the service is constrained to processing book orders. With the platform event/command based approach, a service could process book orders, or simply be performing a survey of what books the clients are interested in, or doing something else I haven’t thought of.

Sorry for the long reply – I wanted to take a step back and explain from the ground up to make sure I was being clear. Hope it worked!

Your line of thought matches closely to what I have been developing in my brain over the last years, too. I view the client side 'program' as a dependency tree of goals ('select book' must be finished before 'buy book' can be persued) through witch the client is driven *by the service*.

However, I also found during coding exercises that the client side code essentially always comes down to handling failed client side expectations no matter how reactive or smart the client side is. In the end it repeatedly comes down to accepting that 4xx Client Error responses are part of the client-server contract and are not to be interpreted as 'broken contract'.

Instead, client owners should accept the fact that something goes wrong from time to time and make use of the application state provided as the body of any 4xx response. How useful that is depends largely on media type design and that is where I think most effort should be spent.

Anyway, very nice work.

On the diagram -- not quite. I would put a lot of the "brains" that you have in the hypermedia processor into the client platform. I see the hypermedia processor as quite dumb -- it has no "intent" it just "executes" the markup and scripts, stores some of the related session state (though not all some state can be stored in the client platform), and interacts with the platform for everything else.
Interaction for HTML is something like
CP->HP: navigate to this URI (initial URI)
HP->CP: navigating to URI (informative)
HP->CP: loaded and here are some commands to exec
(e.g. paint screen this way)
CP->HP: mouse click at X,Y (HP figures out this is a clicked link and follows link)
HP->CP: navigating to URI (informative)
HP->CP: loaded and here are some commands to exec
and so on
(Note there are exceptions to the event/command breakdown I described earlier -- the key is that processing constraints imposed on the HP never carry over to the service. The processing constraints of the HP on the service are only those defined by the uniform interface.)

Carrying this over to other domains, I would keep the HP simple. Any complex processing to go into the CP... but that's based on the model I'm generalizing from HTML, VoiceXML and CCXML. A different, non-reactive model might have a different breakdown.

Agree on events translated into document transfer -- thats a key part of the model: the HP translates events raised by the CP (mouse click, keypress) into state transfer -- it's also many to one: a performance advantage of REST over say telnet for a UI.

I am definitely proposing "reactive" agents over "goal-driven" -- I keep stressing the point because nearly 100% of the REST developers I encounter "in the wild" have made a mental jump to goal-driven agents without even considering using a "reactive" model to solve their problems. Reactive agents are proven and so trying that approach first makes the most sense to me. But I don't see anyone trying it at all.

I often hear "reactive" agents equated with "user interfaces". i.e. developers assuming that their machine-to-machine scenario rules out the reactive model. But reactive agents are not restricted to user interfaces -- CCXML is proof of this.

I also think that hypermedia design is tied to the agent design. i.e. there is a difference between "Reactive" and "goal-driven"/"action-driven" hypermedia. ie. HTML, VoiceXML and CCXML have a built-in concept of "events" from an underlying platform.

Atom is arguably "Action-driven" hypermedia and I suspect that many of the client-server decoupling problems that folks have with using Atom (beyond publishing) are related to this. I have a hard time seeing how the "actions" the client is trying to effect don't constrain server behavior to a restricted set of behaviors defined in the media type and baked into the client.

A "reactive" media type instead places constraints on client behaviors. And, just as we we see in the evolution of HTML, the client can't evolve without evolving the media type. Also servicing multiple types of clients (e.g. an HTML browser and a VoiceXML browser) requires supporting multiple media types - but this is the point of conneg IMO.

I don't know if its a typo but you close with the question of how to make hypertext and link relations evolvable to expand the the contracts that can be expressed beyond HTTP semantics. To me, this feels like the wrong goal -- with a reactive agent I can evolve a service while keeping the hypertext and relations constant and using the uniform interface. If you can do that with action-driven agents I would be impressed -- and maybe you can and I just don't see it (years of doing the reactive thing I guess).

Anyway, I'll stop there -- I'm eager to hear your ideas (coming in Part 2 I guess), and the more of your time I spend on these comments, the longer I'll have to wait for them!

Stu, great that you bring up Mark's work! I have not worked with RDF since 2003 but looking at his subclassing approach with fresh eyes is quite exciting. However - mentioning RDF in standard enterprise projects is like asking for being kicked out right away :-)

Anyway - when it comes to describing resources there is really nothing more suitable than RDF (what a surprise :-). Maybe it is time to investigate uses of RDF again (RDF Forms comes to mind, too).

Hmmm been thinking about the error handling you and Jan were discussing earlier... I wonder if you are essentially removing the receiver processing constraints from the actions/command messages to the point where they are more like events.
i.e. "BuyBook" becomes "Hey I'd really like to buy a book now if that's ok with you... but if not that's fine" which is saying more about the sender than the receiver.

Leave a comment

About this Entry

This page contains a single entry by Stu published on March 29, 2010 4:44 PM.

Versioning RESTful Web Resources - A Survey was the previous entry in this blog.

WS-REST Workshop Themes is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

About Me
(C) 2003-2011 Stuart Charlton

Blogroll on Bloglines

Disclaimer: All opinions expressed in this blog are my own, and are not necessarily shared by my employer or any other organization I am affiliated with.