Versioning RESTful Web Resources - A Survey

| 8 Comments | No TrackBacks

Update: Comments should be working now.

This is my attempt to summarize an overview of my thinking on RESTful versioning. It's a follow up to Square Peg, REST hole. These concepts can be tricky concepts to describe, and I don't really want to write a small book on this topic, so I may get some of this wrong. Thus, expect updates to this entry to improve it in the future.

Data Versioning vs. Language Versioning

Extensibility and versioning in RESTful services can be viewed in terms of two domains of agreement. The two domains are: resource and representation, which could also be thought of as the "data" vs. "language" domains.

First, let's recall what a resource is: a time varying membership function, where the members are instances of a representation at various points in time. The resource can return different values at different times. BUT resources can be narrowed down into very specific semantics, if resource owner wishes. A resource might be "the most recent version" of a record, whose state might change often, or it might be a "specific version" of a record, and thus unchanging in state. These are two different resources, even though they may have the same representation for a period of time. A resource may even contain format metadata and constrain the language emitted, though content negotiation may be preferred.

Regardless of how often the values change, the semantics of the resource should not change. "Revision 3 of purchase order 123" should retain that meaning. If they do change the meaning, it hurts consumers that relied on the old meaning.

When we think of URI versioning, this is a design choice when resources are immutable across time and we create new resources for state changes (similar to how we manage time-series data in a database).

With language extension or versioning, on the other hand, the state is unchanged, but the way that data is represented has changed.

On Language Versioning

Rule #1: Prefer to extend a language in a forwards and/or backwards compatibile manner. Version indicators are a last resort, to denote incompatible changes.

Extension, of course, requires thought. It implies well-specified interpretation policies for language consumers, and in the case of a machine-readable schema, well-specified extension points. But the range of choices aren't too hard to understand.

This table summarizes the current techniques in practice for extensible or versioned languages, using the terminology from the W3C TAG's draft versioning compatibility strategies document, by David Orchard, which I'm going to butcher through my own brief summaries.

 ConsumerProducer
Backwards-Compatible
  • Lookup version notifications
  • Replacement or Side-by-side
  • Version notification via out-of-band channel or links
Forwards-Compatible
  • Must accept unknowns
  • Must preserve unknowns if persisting state
  • Version identifier substitution model
  • Media type specification clearly defines consumer forward compatibility expectations (and/or uses a machine-readable schema to denote forward-compatibility extension areas)
Incompatible
  • Check for version identifier
  • Side-by-side or Breaking Replacement

Some explanations...

Version Notifications

Agents should be notified of new versions. This can be done out-of-band (email, physical letter, carrier pigeon), but it helps to complement this with links. These links could be an extended, and agreed to link relation, and/or as part of the media type specification. The links may point to a description of the version change, or, in the case of a Side-by-Side, the URI that emits the resource in the new language version.

Replacement

This implies that origin server is replaced by a new backwards-compatible version that is able to accept both old versions and new versions of representations sent by a client (usually via a POST link). This is useful in combination with a forward compatible change -- none of the links need to change.

Side by Side

This implies that the origin server provides a new MIME type or URI-space for resources using the new language, along side old resources using the old language. In either case, you are impyling "this language changes everything". In the case of changing URIs to reflect the new language verison, in effect, you're using "resource versioning", something usually relegated to storing time series data , as a means to work around your language compatibility problems.

To make this RESTful, your media type must include a link from the old resources to their new version, along with metadata indicating the version of the language used at the URI, possibly including a link to a machine-readable schema of the new version (if your media type has such a thing, like XML with Relax NG or XSD). In the case of a new MIME type, you would want a link relation that notes an alternate format is available.

Let me underscore this: You cannot expect clients to understand your URI format and swap out all occurrences of "v1" with "v2"., if you do, you're placing a heavy burden of coupling on your client, that YOUR SERVER is so special, that they need to understand YOUR URI format. This is completely antithetical to why we would want to use REST in the first place, unless you're really just tunnelling XML over HTTP for the heck of it. I note that many "REST APIs" out there actually are built this way, which means they're just as point-to-point coupled as other interface styles.

Must Accept Unknowns

If the consumer sees elements in the data it doesn't recognize, it still accepts the representation. Generally, it ignores these elements for processing.

Must preserve unknowns if persisting state

This is an optional follow-on from "Must accept unknowns", and is often forgotten. If representation state is being persisted (i.e. cached) in the consumer's domain for later use, the unrecognized elements should be preserved, and not stripped. This could greatly assist forward compatibility when the client is upgraded to handle the previously unrecognized elements.

Version identifier substitution model

I defer to Section 5.3 of the compatibility strategies document.

Where do you place the version identifier?

In order of preference:


  1. In the media type content

  2. In the MIME type itself, or as a MIME type parameters

  3. In the URI

Version identifier inside the media type content

This has many examples in the wild, such as HTML DOCTYPE, some uses of XMLNS, a version identifier inside your PDF document.

This requires the replacement model for backwards-compatibility, and encourages the greater use of forwards compatibility. It's the way that most web media types have long worked, with varying degrees of success, but note that those formats were long designed with forward compatibility in mind.

It's still possible to combine this approach into side-by-side versioning if need be, especially if you are changing the semantics of your resources.

Version identifier in the MIME type

e.g. application/vnd.mytype;version=2

This is currently a non-standard and debatable technique. The benefit here is that this enables side-by-side versioning without impacting the URI-space. On the other hand, this reeks of avoiding hypermedia and trying to push things to the other layers of the Web Architecture (HTTP and/or URIs). But in many cases this is preferable to a new URI space.

Version identifier in the URI

e.g. http://example.com/data/v1/po/123

I described the primary problem here earlier: you can't assume you are a special snowflake and the client will know that 'v1' is your magic crystal. You must provide a link or a URI template in the media itself (and/or in a service resource) to denote new versions.

The secondary problem is bookmarks, or inbound hyperlinks. In a database system these are known as "foreign keys". Anyone with a relational data background knows that their primary keys really shouldn't change, as it's expensive to propagate that change to foreign keys.

There is, however, one case, where this approach is preferred over the others. This ties back to the beginning of this entry, when I discussed "Resource Versioning". It's clear we mint URIs when the semantics of the resource itself changes. So, if they change with the language, then mint new URIs -- using hypermedia, if possible, to link old concepts to new ones, as this requires a side-by-side compatibility approach.

For example, if we have an Account resource, and a new version of our resources and language we are deprecating the notion of account, and adding two new resources, "Customer" and "Agreement". It makes no sense to preserve the Account URIs for new Customer resources in this case, as the changed meaning would be confusing to clients expecting an Account.

Some Q&A

Aren't bookmarks the problem? Wouldn't life be better if we rejected bookmarked URIs?

Well, yes, they're a problem, but no, life would suck if we rejected bookmarks, because there's no different between a hyperlink and a bookmark. It would be like saying "no one can hyperlink to me", which is absurd.

Wouldn't versioning be simpler if we separated access from identification, like with WSDL services?

If my data identifiers become opaque primary keys like 123 instead of http://example.org/po/123, then they're tightly coupled to the service that produced the document, as it would be the only context in which I could resolve details for that identifier. Now clearly one benefit is, if I create a new incompatible side-by-side service version, technically (assuming I don't need to re-key my database), the stored foreign keys don't change.

In a RESTful approach, URIs are your "foreign keys", and if you embed a version identifier in them, they need to change when you upgrade to the next version if you embed those versions in the URI. Assuming you can't convince your resource owners to use languages with version identifiers as a MIME parameter or inside the language itself, how is that done?

In a word, lazily.

As I've discussed above, your media type should have an extensibility section or link relation(s) that points to the new version. And upon retiring a language at a particular URI, you would use a permanent redirect (301) to tell all consumers to update their bookmarks / foreign keys. In either case, the agent would have the ability to update their persistent reference.

Again, this is a special case -- there really shouldn't be that many incompatible versions, they should be forward-compatible changes that dont' require new URIs unless you're completely mucking with the resource semantics.

In Summary

  1. Prefer extensible, forwards & backwards compatible languages and the replacement approach to compatibility. Note the W3C TAG's position on version identifiers
  2. Be judicious when you use version identifiers in URIs, as cool URIs don't change
  3. For side-by-side deployments, always include a section in your media, or link relation(s), to point to new/old versions, and update references lazily as the consumer refreshes its cached value. Use permanent redirects to retire URIs bound to old language versions.
  4. Version URIs if the semantics of the resource changed, but be courteous to consumers by ensuring links are available to denote the old vs. new alternates
  5. Chapter 13 of Subbu's wonderful new book RESTful Web Services Cookbook provides more detailed illustrations of several versioning techniques.
Reblog this post [with Zemanta]

No TrackBacks

TrackBack URL: http://www.stucharlton.com/blog/mt-tb.cgi/139

8 Comments

Stu:

Nice post.

The biggest challenge I face in my current work is handling application-flow versioning (not media-type or data-element changes). I often work in a multi-tenant app environment where custom pathways must be designed for the application. And they can change over time. Hypermedia makes this possible (yay!), but there are many times when an application-flow update requires support for side-by-side versioning. The puzzle there is how to properly signal the side-by-side within the media type.

I'm curious about your (or anyone else's) experience w/ this pattern (#3 in your summary).

My favorite approach is closest to the way that "forwards compatible" versioning is described above, but with a couple of twists:

  • All media types are defined as extensible -- the server can add additional items at any time that the client may not understand. Thus, you don't need a new version number every time a field is added.
  • Server provides a resource that includes, among other information, the set of version identifiers that it understand.
  • The client MAY include an indication (via an HTTP header, if you're doing HTTP) that "this is the version I am programmed to understand" which the server can use to provide different processing, or redirect, or reject as appropriate.
  • If the client does not say anything about versioning, the server assumes they want the latest and greatest version.

With this approach, the server has all sorts of flexibility in how to support clients that do present a version number identifier, including:

  • Ignore the identifier if the only thing that has changed is adding some new fields ... the client will ignore them anyway.
  • Leave out the new fields (should not be required, but provides a more robust response to naively coded clients).
  • Internally dispatch to different code within the same service, based on the client's requested preference.
  • Redirect the client to a legacy installation of the service that still supports the old version.
  • Refuse to serve this version, returning a suitable 4xx response status if using HTTP.

When using this approach, you're not required to pollute your media types or your URLs with version identifiers. Of course, you can always do that if you really need to for major version changes, but you could also decide to view this as offering a completely different service (with its own media types), and let the normal content negotiation process deal with making sure that the right client gets matched up with the right server.

Of course, this whole thing works best if your service is respecting the hypermedia (HATEOAS) constraint of REST, with the client discovering all the interesting URLs by receiving them from the server, rather than trying to plug their own values into a URL template of some sort.

Stu:

nice effort, but unfortunately you miss the mark on "resource lifecycles" and actions. You carefully avoid the concept of tying actions (i.e. POST a_noun) into specific sets that relate to a particular lifecycle. Unfortunately, you cannot avoid that boundary as Mike explained. For instance a customer exhibits several simultaneous states (marketing e.g. gold, financials e.g. default, support eg level 1, mood e.g happy...), this is often the case for any kind of resource because each resource plays in different business domains (marketing, sales, orders, support, ...).

Each lifecycle (and set of actions) will have to be versioned separately and together as a set.

You are kidding with the "version link relation"? right? This is complete science fiction both from a user perspective and from a server-to-server perspective.

Unfortunately REST cannot dig its way out of that one. As it is the case for the "uniform interface", and many other problems.

You pointed out correctly: "In a RESTful approach, URIs are your "foreign keys", and if you embed a version identifier in them, they need to change when you upgrade to the next version if you embed those versions in the URI." As I explained before, access and identity are coupled and this is lethal when building distributed systems.

As I said, REST is a fraud, you are forcing everyone to hand-code semantics that were given for free in WS-*. We know how consistent this encoding process is going to be. Nice job guys !

So why don't we/you admit that indeed, REST, as it was explained to the world in 2007 by a few individuals, does not work and continuing to use it RESTlessly will lead to massive failures in IT. It will become the worst spaghetti ever built since Turing defined our field.

JJ-

Stu:

Sorry for the late reply - was tied up with some papers to review[grin].

Yes, I've kept in eye on the version link relations I-D. That effort seems aimed at D/VCS models and does not seem readily applicable to cases where the flow of the application changes over time and you want to maintain links to previous flows where appropriate (mostly due to client limitations or preferences).

The most recent cases where this has challenged me are also ones that must support common browsers which adds some additional hurdles.

My current solution is to maintain additional application information on the origin server that knows the most recent version of a particular workflow available for the connected client. This ensures auto-upgrading of common browsers, but adds a burden to the server and does easily allow browsers to 'downgrade' their version preference. Non-browser clients can include an optional header to indicate a version preference (default is the latest, ala common browsers).

This solutions works and reduces link-bloat for often-modified workflows but depends on a custom header that must be included in the Vary collection and is a non-solution for common browsers.

It's messy but meets the requirements before me.

I stumbled into this nice blog on versioning. I think JJ might be confused. I can see very well that versioning interactions will be quite impossible if you publish too many URIs. But, if you don't, and you are discplined to look for links in your represntations that define and expose certain actions, behaviors, business processes, and information, how can REST not solve versioning pretty seamlessly. i.e.




If you follow link 2, you'll get to interact with new media types, new links, etc. In version 2, many (all) of the same link interactions could be available, they just might accept new or versioned media types, etc.

Leave a comment

About this Entry

This page contains a single entry by Stu published on March 3, 2010 2:26 AM.

I need home for a REST was the previous entry in this blog.

Building a RESTful Hypermedia Agent, Part 1 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

About Me
(C) 2003-2011 Stuart Charlton

Blogroll on Bloglines

Disclaimer: All opinions expressed in this blog are my own, and are not necessarily shared by my employer or any other organization I am affiliated with.