Update: Comments should be working now.
This is my attempt to summarize an overview of my thinking on RESTful versioning. It's a follow up to Square Peg, REST hole. These concepts can be tricky concepts to describe, and I don't really want to write a small book on this topic, so I may get some of this wrong. Thus, expect updates to this entry to improve it in the future.
Data Versioning vs. Language Versioning
Extensibility and versioning in RESTful services can be viewed in terms of two domains of agreement. The two domains are: resource and representation, which could also be thought of as the "data" vs. "language" domains.
First, let's recall what a resource is: a time varying membership function, where the members are instances of a representation at various points in time. The resource can return different values at different times. BUT resources can be narrowed down into very specific semantics, if resource owner wishes. A resource might be "the most recent version" of a record, whose state might change often, or it might be a "specific version" of a record, and thus unchanging in state. These are two different resources, even though they may have the same representation for a period of time. A resource may even contain format metadata and constrain the language emitted, though content negotiation may be preferred.
Regardless of how often the values change, the semantics of the resource should not change. "Revision 3 of purchase order 123" should retain that meaning. If they do change the meaning, it hurts consumers that relied on the old meaning.
When we think of URI versioning, this is a design choice when resources are immutable across time and we create new resources for state changes (similar to how we manage time-series data in a database).
With language extension or versioning, on the other hand, the state is unchanged, but the way that data is represented has changed.
On Language Versioning
Rule #1: Prefer to extend a language in a forwards and/or backwards compatibile manner. Version indicators are a last resort, to denote incompatible changes.
Extension, of course, requires thought. It implies well-specified interpretation policies for language consumers, and in the case of a machine-readable schema, well-specified extension points. But the range of choices aren't too hard to understand.
This table summarizes the current techniques in practice for extensible or versioned languages, using the terminology from the W3C TAG's draft versioning compatibility strategies document, by David Orchard, which I'm going to butcher through my own brief summaries.
| | Consumer | Producer |
| Backwards-Compatible | - Lookup version notifications
| - Replacement or Side-by-side
- Version notification via out-of-band channel or links
|
| Forwards-Compatible | - Must accept unknowns
- Must preserve unknowns if persisting state
- Version identifier substitution model
|
- Media type specification clearly defines consumer forward compatibility expectations (and/or uses a machine-readable schema to denote forward-compatibility extension areas)
|
| Incompatible | - Check for version identifier
| - Side-by-side or Breaking Replacement
|
Some explanations...
Version Notifications
Agents should be notified of new versions. This can be done out-of-band (email, physical letter, carrier pigeon), but it helps to complement this with links. These links could be an extended, and agreed to link relation, and/or as part of the media type specification. The links may point to a description of the version change, or, in the case of a Side-by-Side, the URI that emits the resource in the new language version.
Replacement
This implies that origin server is replaced by a new backwards-compatible version that is able to accept both old versions and new versions of representations sent by a client (usually via a POST link). This is useful in combination with a forward compatible change -- none of the links need to change.
Side by Side
This implies that the origin server provides a new MIME type or URI-space for resources using the new language, along side old resources using the old language. In either case, you are impyling "this language changes everything". In the case of changing URIs to reflect the new language verison, in effect, you're using "resource versioning", something usually relegated to storing time series data , as a means to work around your language compatibility problems.
To make this RESTful, your media type must include a link from the old resources to their new version, along with metadata indicating the version of the language used at the URI, possibly including a link to a machine-readable schema of the new version (if your media type has such a thing, like XML with Relax NG or XSD). In the case of a new MIME type, you would want a link relation that notes an alternate format is available.
Let me underscore this: You cannot expect clients to understand your URI format and swap out all occurrences of "v1" with "v2"., if you do, you're placing a heavy burden of coupling on your client, that YOUR SERVER is so special, that they need to understand YOUR URI format. This is completely antithetical to why we would want to use REST in the first place, unless you're really just tunnelling XML over HTTP for the heck of it. I note that many "REST APIs" out there actually are built this way, which means they're just as point-to-point coupled as other interface styles.
Must Accept Unknowns
If the consumer sees elements in the data it doesn't recognize, it still accepts the representation. Generally, it ignores these elements for processing.
Must preserve unknowns if persisting state
This is an optional follow-on from "Must accept unknowns", and is often forgotten. If representation state is being persisted (i.e. cached) in the consumer's domain for later use, the unrecognized elements should be preserved, and not stripped. This could greatly assist forward compatibility when the client is upgraded to handle the previously unrecognized elements.
Version identifier substitution model
I defer to Section 5.3 of the compatibility strategies document.
Where do you place the version identifier?
In order of preference:
- In the media type content
- In the MIME type itself, or as a MIME type parameters
- In the URI
Version identifier inside the media type content
This has many examples in the wild, such as HTML DOCTYPE, some uses of XMLNS, a version identifier inside your PDF document.
This requires the replacement model for backwards-compatibility, and encourages the greater use of forwards compatibility. It's the way that most web media types have long worked, with varying degrees of success, but note that those formats were long designed with forward compatibility in mind.
It's still possible to combine this approach into side-by-side versioning if need be, especially if you are changing the semantics of your resources.
Version identifier in the MIME type
e.g. application/vnd.mytype;version=2
This is currently a non-standard and debatable technique. The benefit here is that this enables side-by-side versioning without impacting the URI-space. On the other hand, this reeks of avoiding hypermedia and trying to push things to the other layers of the Web Architecture (HTTP and/or URIs). But in many cases this is preferable to a new URI space.
Version identifier in the URI
e.g. http://example.com/data/v1/po/123
I described the primary problem here earlier: you can't assume you are a special snowflake and the client will know that 'v1' is your magic crystal. You must provide a link or a URI template in the media itself (and/or in a service resource) to denote new versions.
The secondary problem is bookmarks, or inbound hyperlinks. In a database system these are known as "foreign keys". Anyone with a relational data background knows that their primary keys really shouldn't change, as it's expensive to propagate that change to foreign keys.
There is, however, one case, where this approach is preferred over the others. This ties back to the beginning of this entry, when I discussed "Resource Versioning". It's clear we mint URIs when the semantics of the resource itself changes. So, if they change with the language, then mint new URIs -- using hypermedia, if possible, to link old concepts to new ones, as this requires a side-by-side compatibility approach.
For example, if we have an Account resource, and a new version of our resources and language we are deprecating the notion of account, and adding two new resources, "Customer" and "Agreement". It makes no sense to preserve the Account URIs for new Customer resources in this case, as the changed meaning would be confusing to clients expecting an Account.
Some Q&A
Aren't bookmarks the problem? Wouldn't life be better if we rejected bookmarked URIs?
Well, yes, they're a problem, but no, life would suck if we rejected bookmarks, because there's no different between a hyperlink and a bookmark. It would be like saying "no one can hyperlink to me", which is absurd.
Wouldn't versioning be simpler if we separated access from identification, like with WSDL services?
If my data identifiers become opaque primary keys like 123 instead of http://example.org/po/123, then they're tightly coupled to the service that produced the document, as it would be the only context in which I could resolve details for that identifier. Now clearly one benefit is, if I create a new incompatible side-by-side service version, technically (assuming I don't need to re-key my database), the stored foreign keys don't change.
In a RESTful approach, URIs are your "foreign keys", and if you embed a version identifier in them, they need to change when you upgrade to the next version if you embed those versions in the URI. Assuming you can't convince your resource owners to use languages with version identifiers as a MIME parameter or inside the language itself, how is that done?
In a word, lazily.
As I've discussed above, your media type should have an extensibility section or link relation(s) that points to the new version. And upon retiring a language at a particular URI, you would use a permanent redirect (301) to tell all consumers to update their bookmarks / foreign keys. In either case, the agent would have the ability to update their persistent reference.
Again, this is a special case -- there really shouldn't be that many incompatible versions, they should be forward-compatible changes that dont' require new URIs unless you're completely mucking with the resource semantics.
In Summary
- Prefer extensible, forwards & backwards compatible languages and the replacement approach to compatibility. Note the W3C TAG's position on version identifiers
- Be judicious when you use version identifiers in URIs, as cool URIs don't change
- For side-by-side deployments, always include a section in your media, or link relation(s), to point to new/old versions, and update references lazily as the consumer refreshes its cached value. Use permanent redirects to retire URIs bound to old language versions.
- Version URIs if the semantics of the resource changed, but be courteous to consumers by ensuring links are available to denote the old vs. new alternates
- Chapter 13 of Subbu's wonderful new book RESTful Web Services Cookbook provides more detailed illustrations of several versioning techniques.