Recently in Tech Category

Update: Comments should be working now.

This is my attempt to summarize an overview of my thinking on RESTful versioning. It's a follow up to Square Peg, REST hole. These concepts can be tricky concepts to describe, and I don't really want to write a small book on this topic, so I may get some of this wrong. Thus, expect updates to this entry to improve it in the future.

Data Versioning vs. Language Versioning

Extensibility and versioning in RESTful services can be viewed in terms of two domains of agreement. The two domains are: resource and representation, which could also be thought of as the "data" vs. "language" domains.

First, let's recall what a resource is: a time varying membership function, where the members are instances of a representation at various points in time. The resource can return different values at different times. BUT resources can be narrowed down into very specific semantics, if resource owner wishes. A resource might be "the most recent version" of a record, whose state might change often, or it might be a "specific version" of a record, and thus unchanging in state. These are two different resources, even though they may have the same representation for a period of time. A resource may even contain format metadata and constrain the language emitted, though content negotiation may be preferred.

Regardless of how often the values change, the semantics of the resource should not change. "Revision 3 of purchase order 123" should retain that meaning. If they do change the meaning, it hurts consumers that relied on the old meaning.

When we think of URI versioning, this is a design choice when resources are immutable across time and we create new resources for state changes (similar to how we manage time-series data in a database).

With language extension or versioning, on the other hand, the state is unchanged, but the way that data is represented has changed.

On Language Versioning

Rule #1: Prefer to extend a language in a forwards and/or backwards compatibile manner. Version indicators are a last resort, to denote incompatible changes.

Extension, of course, requires thought. It implies well-specified interpretation policies for language consumers, and in the case of a machine-readable schema, well-specified extension points. But the range of choices aren't too hard to understand.

This table summarizes the current techniques in practice for extensible or versioned languages, using the terminology from the W3C TAG's draft versioning compatibility strategies document, by David Orchard, which I'm going to butcher through my own brief summaries.

 ConsumerProducer
Backwards-Compatible
  • Lookup version notifications
  • Replacement or Side-by-side
  • Version notification via out-of-band channel or links
Forwards-Compatible
  • Must accept unknowns
  • Must preserve unknowns if persisting state
  • Version identifier substitution model
  • Media type specification clearly defines consumer forward compatibility expectations (and/or uses a machine-readable schema to denote forward-compatibility extension areas)
Incompatible
  • Check for version identifier
  • Side-by-side or Breaking Replacement

Some explanations...

Version Notifications

Agents should be notified of new versions. This can be done out-of-band (email, physical letter, carrier pigeon), but it helps to complement this with links. These links could be an extended, and agreed to link relation, and/or as part of the media type specification. The links may point to a description of the version change, or, in the case of a Side-by-Side, the URI that emits the resource in the new language version.

Replacement

This implies that origin server is replaced by a new backwards-compatible version that is able to accept both old versions and new versions of representations sent by a client (usually via a POST link). This is useful in combination with a forward compatible change -- none of the links need to change.

Side by Side

This implies that the origin server provides a new MIME type or URI-space for resources using the new language, along side old resources using the old language. In either case, you are impyling "this language changes everything". In the case of changing URIs to reflect the new language verison, in effect, you're using "resource versioning", something usually relegated to storing time series data , as a means to work around your language compatibility problems.

To make this RESTful, your media type must include a link from the old resources to their new version, along with metadata indicating the version of the language used at the URI, possibly including a link to a machine-readable schema of the new version (if your media type has such a thing, like XML with Relax NG or XSD). In the case of a new MIME type, you would want a link relation that notes an alternate format is available.

Let me underscore this: You cannot expect clients to understand your URI format and swap out all occurrences of "v1" with "v2"., if you do, you're placing a heavy burden of coupling on your client, that YOUR SERVER is so special, that they need to understand YOUR URI format. This is completely antithetical to why we would want to use REST in the first place, unless you're really just tunnelling XML over HTTP for the heck of it. I note that many "REST APIs" out there actually are built this way, which means they're just as point-to-point coupled as other interface styles.

Must Accept Unknowns

If the consumer sees elements in the data it doesn't recognize, it still accepts the representation. Generally, it ignores these elements for processing.

Must preserve unknowns if persisting state

This is an optional follow-on from "Must accept unknowns", and is often forgotten. If representation state is being persisted (i.e. cached) in the consumer's domain for later use, the unrecognized elements should be preserved, and not stripped. This could greatly assist forward compatibility when the client is upgraded to handle the previously unrecognized elements.

Version identifier substitution model

I defer to Section 5.3 of the compatibility strategies document.

Where do you place the version identifier?

In order of preference:


  1. In the media type content

  2. In the MIME type itself, or as a MIME type parameters

  3. In the URI

Version identifier inside the media type content

This has many examples in the wild, such as HTML DOCTYPE, some uses of XMLNS, a version identifier inside your PDF document.

This requires the replacement model for backwards-compatibility, and encourages the greater use of forwards compatibility. It's the way that most web media types have long worked, with varying degrees of success, but note that those formats were long designed with forward compatibility in mind.

It's still possible to combine this approach into side-by-side versioning if need be, especially if you are changing the semantics of your resources.

Version identifier in the MIME type

e.g. application/vnd.mytype;version=2

This is currently a non-standard and debatable technique. The benefit here is that this enables side-by-side versioning without impacting the URI-space. On the other hand, this reeks of avoiding hypermedia and trying to push things to the other layers of the Web Architecture (HTTP and/or URIs). But in many cases this is preferable to a new URI space.

Version identifier in the URI

e.g. http://example.com/data/v1/po/123

I described the primary problem here earlier: you can't assume you are a special snowflake and the client will know that 'v1' is your magic crystal. You must provide a link or a URI template in the media itself (and/or in a service resource) to denote new versions.

The secondary problem is bookmarks, or inbound hyperlinks. In a database system these are known as "foreign keys". Anyone with a relational data background knows that their primary keys really shouldn't change, as it's expensive to propagate that change to foreign keys.

There is, however, one case, where this approach is preferred over the others. This ties back to the beginning of this entry, when I discussed "Resource Versioning". It's clear we mint URIs when the semantics of the resource itself changes. So, if they change with the language, then mint new URIs -- using hypermedia, if possible, to link old concepts to new ones, as this requires a side-by-side compatibility approach.

For example, if we have an Account resource, and a new version of our resources and language we are deprecating the notion of account, and adding two new resources, "Customer" and "Agreement". It makes no sense to preserve the Account URIs for new Customer resources in this case, as the changed meaning would be confusing to clients expecting an Account.

Some Q&A

Aren't bookmarks the problem? Wouldn't life be better if we rejected bookmarked URIs?

Well, yes, they're a problem, but no, life would suck if we rejected bookmarks, because there's no different between a hyperlink and a bookmark. It would be like saying "no one can hyperlink to me", which is absurd.

Wouldn't versioning be simpler if we separated access from identification, like with WSDL services?

If my data identifiers become opaque primary keys like 123 instead of http://example.org/po/123, then they're tightly coupled to the service that produced the document, as it would be the only context in which I could resolve details for that identifier. Now clearly one benefit is, if I create a new incompatible side-by-side service version, technically (assuming I don't need to re-key my database), the stored foreign keys don't change.

In a RESTful approach, URIs are your "foreign keys", and if you embed a version identifier in them, they need to change when you upgrade to the next version if you embed those versions in the URI. Assuming you can't convince your resource owners to use languages with version identifiers as a MIME parameter or inside the language itself, how is that done?

In a word, lazily.

As I've discussed above, your media type should have an extensibility section or link relation(s) that points to the new version. And upon retiring a language at a particular URI, you would use a permanent redirect (301) to tell all consumers to update their bookmarks / foreign keys. In either case, the agent would have the ability to update their persistent reference.

Again, this is a special case -- there really shouldn't be that many incompatible versions, they should be forward-compatible changes that dont' require new URIs unless you're completely mucking with the resource semantics.

In Summary

  1. Prefer extensible, forwards & backwards compatible languages and the replacement approach to compatibility. Note the W3C TAG's position on version identifiers
  2. Be judicious when you use version identifiers in URIs, as cool URIs don't change
  3. For side-by-side deployments, always include a section in your media, or link relation(s), to point to new/old versions, and update references lazily as the consumer refreshes its cached value. Use permanent redirects to retire URIs bound to old language versions.
  4. Version URIs if the semantics of the resource changed, but be courteous to consumers by ensuring links are available to denote the old vs. new alternates
  5. Chapter 13 of Subbu's wonderful new book RESTful Web Services Cookbook provides more detailed illustrations of several versioning techniques.
Reblog this post [with Zemanta]

I need home for a REST

| No Comments | No TrackBacks


Time to dust off my microphone and bring up a couple of topics on REST and the Web Architecture
- Versioning, or "Cool URIs don't change -- but my data format does!"
- Why the Web Architecture could use a Programming Model for the Enterprise

Which I'll try to get to this weekend.

For now, I leave you with two things:

First, reflecting on Wiliams' recent Square Peg, REST Hole, I draw from the archives, What are the benefits of WS-* or proprietary services?

Let's keep our eye on the prize. REST is a style aimed at extensible, low entry-barrier, multi-organizational, confederate information sharing and communication. I note that most IT organizations are confederacies, adopting a federal or feudal governance model.

The Web Architecture itself (MIME types, HTTP, URIs) provides a much-needed stable intermediate form for interoperability among many different systems and applications -- something that an general-purpose orientation, like SOA, doesn't really provide. Or, fitting for RESTafarians, it is a shared hallucination ;-)

Not every system, or layer of an enterprise's architecture, has the same requirements for scalability or interoperability. The post from 2007 highlights such examples.

Secondly, the song, which ruled my college years in Canada....

Reblog this post [with Zemanta]

The Trouble with NoSQL

| No Comments | No TrackBacks

I have an ambivalent feeling towards this NoSQL trend, on a few levels.

a) "RDBMS don't perform or scale".

I've seen this in presentations, blog posts, and even in the Hadoop O'Reilly book. I'm not sure if this is sloppiness, ignorance or plain dishonesty. Anyone paying attention, it's pretty clear that RDBMS do perform and scale: there are several 1+ petabyte Teradata implementations, Oracle RAC is used heavily at Amazon.com (70 TB) and Yahoo! (250 TB), for example. Of course, this is about scalability in terms of data volume, and huge queries. On the OLTP side, the TPC benchmark continues to show Oracle and DB2 are able to pull out staggering numbers both in classic SMP and in clustered configurations (yes, DB2 can do shared nothing).

This is not to say RDBMS are the solution to all data persistence problems. I'm an old object database guy and there were (and are) many reasons why one would use that (or one of the newer scalable key/value stores like Cassandra). But, please present the technology on its merits, not based on completely misleading claims.

One almost gets the impression that "If it's not open source, it doesn't exist", which is absurd considering the billions Oracle, IBM, Microsoft and Sybase continue to rake in.

b) "In the CAP tradeoffs, availability > consistency, almost always"

Except when you're running financial analyses. Regulators don't like "eventually consistent" accounting statements. Even when you have terabytes of them to go through.

To me, the best approach would be to provide developers and data architects a knob to adjust the level of consistency vs. availability vs. partition-tolerance depending on the circumstance (the query, the data, etc.)

c) What happened to "Data Management"?

I'd be willing to sacrifice the "R" in RDBMS for certain reasons, but I'm less interested in sacrificing the "MS" part, i.e. "Management System".

There's an eternal battle between those that want the data intertwined with the code and those that want the data separate from the code. I grew up thinking the former, and learned to appreciate the latter.

Every generation of programmers seem to go through this phase where the next-gen persistence engine becomes all the rage. From CODASYL to ODBMS to XML databases to Object caches, and now key/value stores or "cloud databases".

Managing data, scale, and partitioning in the application is a workaround, not a very pleasing solution. I understand people have to get their jobs done, but enterprises seem to have different data management requirements than young companies. Most cases the data exists to support business operations or business decisions. Quality is paramount, and poor management leads to data duplication and mistakes when other applications need to access that data. One tends not to notice these problems early in the life of an application, it tends to be something that occurs across applications that integrate with one another over time.

Similarly, "schema-less" data persistence is only beneficial early in development stages. Later on it becomes pretty useful, and over time, it's almost essential if you want to reuse or repurpose that data and be able to interpret it consistently without having to crack open the supporting codebase.

And strong DBAs have a unique perspective on data & performance management, one I've found lacking in many a programmer (with the ones at Google being a notable exception. They truly seem to have instilled the advanced DBA's sensibility into its engineering work).

d) Are you really sure that SQL is the problem?

I can agree that many of the cheaper (or free) RDBMS don't scale well. But why do people think SQL is the reason that they don't scale? It seems like conflating logical with physical issues. The traditional SQL RDBMS model may not be the only way to do logical data management, but relying on programmatic solutions and ad hoc query languages certainly isn't very satisfying, it seems all very 1970's. Throwing out logical data design & management implies horrible long-term consequences on data quality and correct modelling of a business domain.

On this note, there's a new paper contrasting Hadoop with parallel databases for large-scale data analysis tasks (written in part by the Vertica guys - Mike Stonebraker's new company). The conclusions are interesting -- Hadoop isn't the clear performance leader, but it certainly wins major points for being simpler to get going than your traditional DBMS. On the other hand, specifying SQL statements looks quite compelling vs. writing a bunch of map and reduce functions. And the results show these SQL databases certainly perform well on query (assuming you can load them fast enough). Another paper, by some of the same authors, looks at combining Hadoop with a parallel DBMS (probably Vertica) with encouraging results.

Having lived through the object database wars, and watch my beloved databases get trampled into a niche, my sense is this:

- The IT world didn't reboot when clouds came out, there's a lot of assumptions worth challenging, but lots to learn from history.

- There's a real chance most NoSQL solutions will remain niches while RDBMS continue to dominate, because customers will force their vendors to scale them out. The real question is whether this is an impossibility due to the rational model or SQL. I'd say that's highly doubtful. They'll find a way, if their customers pressure them.

- On the other hand, there's a real chance for a NoSQL alternative to make it big and succeed IF it evolves to be a true DBMS, not just a persistence engine, provides adjustable CAP tradeoffs in its interface, and offers us a worthy successor to SQL.

I've uploaded my position paper for the OOPSLA 2009 Cloud Design Workshop next week. This provides a detailed technical overview of what Elastra has been working on for the past year.

Cloud Computing has been a catalyst that has been accelerating a long-needed convergence between IT Operations and Application Architecture. We need to build systems to be operated, managed, and governed -- not as an afterthought. And we need better collaboration between IT specialists. Through a mix of web architecture, and a dose of autonomic computing, and we may have the beginnings of a new inter-cloud architecture. It feels like the end of a marathon, but we've only reached the first checkpoint.

There are at least six views on Cloud Computing out there, and why they're important. Some people are pretty adamant that their definition is the one true definition, others tend to admit the overlap. Optimists would call this state of affairs "synergy", pessimists would call "vagueness", cynics would call it "sophistry".


I'd like to distill, briefly, how I see things.

1. Theme: Scale without skill, Availablility without avarice

Why Cloud? "Don't worry about Scale or Availability, SuperCloudPlatform Will Take Care of It"
Do: Adopt a Cloud Platform, like Google App Engine, Azure, or Force.com
Don't: Worry about Infrastructure as a service, that's so .... 2006.
Laugh Nervously About: The Magic Architecture & Buzzword Bingo required to make this work. Also, all those PaaS APIs seem rather proprietary....

2. Gimme an A! A! S!

Why Cloud? "Consuming IT as an On-Demand Service instead of as a capital intensive product"
Do: Build out your cloud architecture, with its various layers, and invest in software & services at each layer.
Don't: Get locked into anyone's narrow concept of a cloud. PaaS, IaaS, some SaaS, etc., are all contenders.
Laugh Nervously About: That, as with SOA, everything is a cloud; that you can't buy a cloud, yet everyone seems to be trying to sell you one.

3. Efficiency through Outsourcing

Why Cloud? 1. "Owning your computers is as passe as owning your own energy generator" 2. "Do more with less"
Do: Find one or more strategic cloud partners and begin piilot outsourcing
Don't: Buy more hardware or software to use the cloud. It's snake oil.
Laugh Nervously About: The observation that outsourcing has been a panacea for IT's woes for over 15 years, and last we saw, it seemed like a shell game.

4. Efficiency through Consolidation

Why Cloud? "Your DC's Power, Thermal Hardware utilization are awful; you really could improve that. Virtualization was the start, this is the next step"
Do: Buy Cloud Management & Data Center Automation software, use a Cloud Services partner/SI, keep maturing your use of virtualization.
Don't: Really jump into Cloud Definition #3 until your own house is in order.
Laugh Nervously About: The extra software you're expected to buy, and that it seems to require extra hardware too. "Won't Get Fooled Again" by The Who seems like an apt theme song, particularly the final verse.


5. Process Networks

Why Cloud? 1. "The next-generation of the Internet that will tie together process specialization, information integration, social networking, and contextual data" 2. this is sort of where the "Web Services" vision, circa 2002, left off, after which time they made some poor investments in personal hygiene protocols and associated chicanery.
Do: Meditate on the Zen nature of this future evolution of the Internet. Sign on to Twitter. Attend lots of conferences with "2.0" in the title. Maybe buy a BPM tool, or invest in some Strategic Cloud Consulting Services. Clouds #1, #2, #3, and #4 may be useful on the path to nirvana.
Don't: Worry too much about technical details, it's all about your business anyway.
Laugh Nervously About: 1. That no one knows what the fuck these people are talking about, even though there's probably something interesting happening here. 2. That the paint is still wet on BPM vendors renaming themselves Cloud companies.

6. The Rise of Lean IT


Why Cloud? "Reduced lead times to enabling change in your IT environment, thus driving greater business value"
Do: Start redesigning your IT processes. ITIL v3 ain't bad, if you take it with a grain of salt. Pick up some IT automation and management software while you're at it.
Don't: Think that technology alone will solve your problems, this is mostly about organization & culture, baby.
Laugh Nervously About: 1. That the primary industries that have embraced Lean concepts are Automotive and Telecommunications, and the telcos have been talking about it for 10+ years with little sign they're really serious about it. 2. Agile/Lean proponents tend to be backed by a posse of folks that like to write manifestos.


In sum, a busy, talkative workshop, with reasonable attendance (I counted upwards of 50 at peak). A good mix of government and some industry (the only real missing speaker was HP, though they attended). I'm also continually thankful that Elastra gets invited to these sorts of events; it sort of validates that our approach is interesting to the larger players.

Many shapes & sizes of Cloud


A clear indicator of the maturing of the cloud community was that private clouds and hybrid clouds were on everyone's lips, and seem to have been essential to generating "enterprise interest" in the concept of cloud computing.

I recommend reading NIST's presentation on this topic, it's well thought out (I'll link to it when the proceedings are published).

It's not that private clouds are what people are exclusively interested in, it's that a) in the short run, they're the only game that's acceptable for Federal agencies or for conservative enterprises, due to very real security and compliance fears, b) even in the long run, the reality will be a hybrid cloud world, not a Big Switch, and c) the benefit behind clouds cannot be just about outsourcing, or else we're screwed, it's just an over-hyped sales pitch. At this point, I think those that say "Private Clouds are a distraction" are full of shit.

The benefit of clouds

In my talk (slides should be available soon), I discussed how Elastra views the benefit of clouds to business. In order of "difficulty to achieve": Primarily it's about resetting cost structure -- moving ongoing IT demand consumption to a variable cost structure, and freeing IT from viewing everything as massive fixed cost. Secondly, it's about drastically reducing the lead times to change one's IT infrastructure in response to demand. Thirdly, it's about increasing visibility of the IT infrastructure and how it ties to business results. Fourth, it enables a more precise, commoditized approach to outsourcing. These benefits have their roots in the overall move towards services-oriented computing, just that they're being applied inwardly.

Some might say that clouds "TODAY" provide a commoditized , precise approach to outsourcing. And I would say "sort of" -- the caveat is that everything is still proprietary, and that SLAs are mostly crap, the price structures don't work well except for 'very elastic uses' of scale, and there aren't many large-scale clouds that are viable alternatives to Amazon EC2 (though Windows Azure is getting there, and IBM is likely coming).

I don't really know if the attendees bought into my view, but I still think we don't talk about this stuff enough -- we're too busy futzing with technology.

What about PaaS?

These sorts of interoperability discussions have a hard time reconciling with Platform as a Service. The Salesforce.com talk was good, explaining their view of interoperability -- they allow you to share your data with other clouds & services, and that's their nirvana. But your implementation and custom logic is 100% proprietary. You might be able to get portable logic if you use a Model Driven Architecture approach that generates APEX code for you, but that's about the extent of today's possibilities.

Reuven was notably unhappy with this state of affairs. I chimed in and claimed that PaaS works well for departmental applications or vertical apps where "you just want it done and don't really care about customization lock-in". There are many, many examples of business applications in that category: witness every Microsoft Excel macro or Access database in the small, or any large-scale ERP, HR or CRM system. This is why Salesforce is a billion dollar company.

I know this will come as a blow to the "open uber alles" crowd, but sorry - enterprises care about reducing lock-in on their infrastructure. Not really in their applications. If enterprises start looking at IT more in application terms, and PaaS becomes "the way forward", we had better start dusting off our SOA, ETL, and EAI hats, because that's where the problem will always lie.

In fact, the biggest problem with Salesforce's stuff is not that it's locked-in to their software, it's that you can't choose to run it inside a private cloud. They're one of the stubborn ones against "private clouds", understandably. For this reason, I think Microsoft's approach to PaaS is probably going to be very successful. .NET is your platform. It's very popular, and will continue to be so because of Azure.

Standards ... If not today, when?

On the interoperability front, there was discussion about whether it was premature for cloud standards. And generally, the feeling was, yes it is -- BUT -- history has shown it's important to start the discussion early, and get people networking early, lest they go off and do their own interpretations of what people need & we wind up with a mess despite our best intensions. One of the analogies that Bob Marcus kept alluding to was the emergence of the Enterprise Service Bus in the SOA world, which emerged because even though we had the WS-* stack, things still weren't all that interoperable, and capabilities varied wildly, so a mediator became necesary. And the service buses themselves were all very different in their operation, so required specialized knowledge to install, use, and develop against.

A lot of the sessions are "here's what we're doing to help to get people to talk about standards", which is fine, but indicative of how "early days" all of this interoperability work really is. The general feeling can be summed up as:

a) Cloud Computing is still fuzzy, but has the potential to be great ;
b) Clouds are mostly closed today, and that's OK, but not great ;
c) A modicum of provider-level openness will be essential for the Federal community.

IOW, it's a huge mistake to assume that the EC2 API is a de facto open standard. I don't think anyone in the room had this illusion. Here's why, IMO: for one, it's a stretch to call it "open" -- it's under the control of a single company, and licensed by them. If they decide they dislike the software that implements that API, they can change the license for future versions, and shutdown old versions, making older versions basically useless. Secondly, there are a number of core cases it doesn't cover. The biggest is that it doesn't give you the ability to express "desired state" of a cloud as a document. It's just an API. Whereas enterprises seem to want to be able to reuse their configurations, store them, verify, certify, sign, and version control them, etc. Hence the interest in document standards like OVF or hypermedia formats like EDML and ECML.

The problem is that if these standards take so long to build, then we're going to have to invest in "cloud service buses" to enable portability and interoperability. In a prior post, I mentioned that this is what I believe will probably happen. There are too many cooks in the kitchen.

The substantive discussion on potential standards included:

Winston Bumpus (DMTF President)'s talk on OVF
Mark Carlson's talk on SNIA's XAM initiative

And from the vendor side, there was:

Enomaly's UCI
Sun's Cloud API
Elastra's Markup Languages

All which deserve a separate post.

My short takeaway was this: OVF is likely going to be very popular. We're going to regret its scope decisions eventually (i.e. a focus on install & deploy, and little else), and I think there's going to need to be proprietary extensions to enable its use in a "cloud" context, but as Winston called it, "it's the MP3 of the data centre".

Get enough people repeating that to themselves, and I think they'll have a marketing winner. If Woody Allen is right, and "80% of success is showing up", I'd say yes, DMTF currently is the leading candidate for becoming one of the premier cloud standards bodies.


I tend to think of interoperability as a gradient.

The old industry stalwart from the 1990's is what I'd call "runtime interoperability", wherein you could write a Java EE application, deploy it on a Java EE application server, and (with a questionable amount of tweaking), get it to operate. SQL was another attempt at this, with mixed success. The later CORBA standards tried too, with the Portable Object Adapter (POA). And clearly, the ISO/ANSI C runtime libraries have been successful, as have many other programming libraries.

The other angle of interoperability grew in the 2000's is what I'd call "protocol interoperability", an approach that, at first anyway, only a network engineer could love. Most of the *TP's on the internet take this approach, where the "network" is first, and dictates the pattern of interaction -- the "developer" and their desires or productivity is secondary.

With cloud computing, we're currently going through the age old discovery of "what form of interoperability makes sense?". Especially given that we're dealing with networked applications (indicating a need for "protocol interoperability") but also with complex configuration & resource dependencies for security, scalability, etc. (an area where "runtime interoperability" usually plays).

Starting Observation: Microsoft Has A Clue.

Windows Azure is trying to balance these approaches to interoperability. For example, .NET Access Control Services allow you to federate identity between your own Active Directory and Azure. This is all just Active Directory Federation Services (ADFS) and using the WS-Federation "standard"; something you could do with OpenSSO too, for example, for over a year. But they'll probably make it easier if you stick within the .NET / Windows world.

A similar case could be made with their .NET Service Bus,  as a way of enabling Windows Communication Foundation and Biztalk applications span Windows Azure and private deployment(s).  This isn't just a pipe dream, either, they're actively doing this with the early Azure releases.

The Scope of Interoperability

What makes this work is that .NET is already a widely used platform in private data centers, and that .NET is a single-source runtime.     Now, an astute observer may exclaim,  "but that's not interoperable!  Where is the multi-vendor ecosystem!?"  At which point we have to ask ourselves, what's the scope of desired interoperability?  

Is it :

- A vendor ecosystem of interoperable runtimes?  Ponder the success and market results of SQL, Java EE, etc.  before wishing for this.    Where they did make a difference?   (They did make a difference, but perhaps not where one would intuitively think.)

and/or

- The ability to enable multiple providers to host a single runtime and enable interoperable "services" (e.g. identity, data, process, etc.) across these runtimes?   

I suspect the latter is more readily attainable, and likely higher value, than the former. And note it doesn't preclude the existence of an ecosystem. It just suggests that enterprises are going to care more about cloud-spanning functionality in their "chosen car trunk" than wait for a common runtime to emerge.

What are the alternatives for a "hybrid cloud" platform to .NET and Azure?   

  1. Force.com APEX might work if they invested in private deployments -- not likely.     
  2. There's Java, though Sun, IBM and Oracle haven't been doing much there yet.  
  3. There's EngineYard starting down the Enterprise Ruby on Rails path.   
  4. Google perhaps heading down the Enterprise Python path (also not likely)
and of course, everyone's favorite...
  1. Infrastructure as a Service, where you could write your infrastructure in Erlang and OCaml for all your cloud provider cares (so long as you don't use multicast ;-)

In this last case, runtime interoperability would require a lot of "roll your own" configuration management, integration, and interoperability.     Or you could rely on...

  1. So-called "Cloud Servers" (e.g. CloudSwitch, 3Tera, Elastra, etc.)

Which give you ways to help craft models & designs & orchestrations that help you with configuration management, integration, policy, interoperability, and governance.  Which in essence is just like what the Hybrid PaaS guys are doing above:  constraining the problem space to gain some level of deployment flexibility.   The difference is that cloud servers boil the problem down to a (hard) configuration management problem, instead of building "a standard runtime to rule them all". 

Naturally, because I work at one of those "cloud server" vendors, you'd think this is my preferred model. But honestly, I'd be pretty happy for the industry if they agreed on either model. Time will tell.

My Predictions

a) I have serious doubts about a "new" cloud runtime portability standard.    The battle lines were drawn long ago, and while they'll blur, it likely will continue to look like " .NET vs. Java vs. everything else" for at least 2+ years.

b) One could argue it would be nice to build a standard protocol (using an architecture that fits how early adopters think) for Infrastructure as a Service to provision "Obvious Stuff" like storage, CPU, and network.   The DMTF CIM stuff is great but probably too low-level,  and too WS-* focused to be palatable to early adopters.   The DMTF OVF stuff is likewise great, but isn't focused on "lifecycle", i.e. what the heck happens to this deployment over time?   It's (thus far) focused on creating virtual appliance bundles.

Something RESTful would be nice to enhance our serendipity, but frankly the EC2 API isn't all that great of a starting point (for several reasons; different discussion though).    

Regardless of the architecture, the big win here would be that it would reduce the need for a "Cloud Service Bus" that mediates among different APIs.   I think this kind of standard will happen, but it will take 2+ years, thus being ratified just as early adopters have bought their shiny new Cloud Service Bus.....  ;-)

c) A wild card is where the "massively parallel processing uber alles" crowd will flock.   From what I can tell, four visible options:  
(i) Hadoop (i.e., Java; though I bet there's a .NET port coming), 
(ii) Open Grid Forum / Globus,
(iii) Parallel SQL Database (e.g. Vertica, ParAccel, Greenplum, etc.),
(iv) Proprietary Platform (e.g. Google)

And those cases are very clearly NOT going to be a likely candidate for hybrid-cloud interoperability below the service-API level, given the latency requirements and tight coupling inside those services.

d) Even in a world of a handful of interoperable Cloud Platforms, I suspect there's a going to still be a big configuration management and governance problem.

Where does Elastra's work fit into this?

Well, first, let's be clear: I have modest expectations for our work on EDML, ECML, etc. I don't expect them to become standards. I do hope to contribute the work we've put into this stuff into a standards effort, and that the industry really does adopt a RESTful linked data approach to describing IT. On the other hand, we've been at this for over a year, and I doubt there will be industry consensus on even 20% of the topics we're modeling. EDML, with its emphasis on resource reservation, allocation, packages and settings, sure, I could see value. ECML, however, is an architecture and policy description language; I suspect there's a lot more acrimony awaiting in there.

Secondly, I have no illusions about the ability for a startup or even a "community" of individual contributors to influence or fund a standard with this much industry attention. Large companies will get their way, invariably. Good ideas may survive through a combination of luck, serendipity, and maybe small doses of charisma and chutzpah among the evangelizers.

So, I will continue to show up at the occasional interoperability or standards meeting, post on mailing lists, etc., but otherwise I'm focused on our product suite. We built these languages to get our jobs done, and are happy to open them up when we have the time to complete the documentation for a wider audience. For now, I'll be presenting highlights at the OMG cloud interoperability meeting in March.

Previously on the hit game show, "What's a Cloud?"

Over on RedMonk, I heard a very intriguing quote from James Governor that was buried in video:

"If you think of the post-SOA term, from Nick Gall... Web Oriented Architecture, clearly this is somewhat different from SOA, although there are some patterns common to both of them..... Is the cloud Web-Oriented Operations, or WOO? (We have WOA and WOO)... and what IBM is saying is definitely not WOO, it's business as usual, it's just about flexible delivery of application -- all the stuff that is goodness, all the stuff that Tivoli has been talking about since 1995. That stuff all has value, but it's not Cloud. Cloud involves difference. Business as usual, that's just provisioning service, and automation and virtualization, which is all good, but... if I hear a another person tell me that CLOUD = SOA + VIRTUALIZATION + AUTOMATION, I'm going to ignore them and rubbish the idea as much as I can."

Preaching to the choir here; I left BEA almost a year ago to build out a WOA platform for clouds.

But I'm curious -- as far as I can tell, most clouds really ARE some combination of:


  1. Service Interface (e.g. Amazon Web Services aren't really WOA up close)
  2. Provisioning and Automation of some sort (e.g. images, web applications, multi-tier designs)
  3. Virtualization is admittedly optional, though increasingly common

So, if cloud is different, is the difference really a trend towards WOA, or is this really going to happen? I see two patterns:

One pattern is emerging from the IBM and HPs of the world that have collected a number of shiny baubles in their ERP4IT stacks (Tivoli and OpenView) and invested heavily in SOA and WS-* to painstakingly integrate them (and the pile of IT that has been built on this technology over the past 5+ years)> This pattern indicates that the IT world is cleaving in two, with web architecture on one side, to build the new class of end-user services, and boring old SOA+VIRTUALIZATION+PROVISIONING for the back end.

The other pattern is that the cloud is about Web Architecture end-to-end, using WOA to enable linked data and mashups for the development lifecycle, architecture & operations lifecycle , and end-user-services.

I wouldn't bet on the latter being a fait accompli, as most haven't wrapped their heads around how to make this work. And of course, there's a lot of inertia. There are bright spots: notice one of those links comes from IBM Rational's Jazz / OSLC initiative - they seem to "get the Web" for enabling interoperable software delivery lifecycle tools. But the problem is end-to-end. At some point the industry has to recognize that IT is becoming complicated enough that planning for product-line-style reuse is of isolated value, and designing for serendipity and applying knowledge representation principles at global scale are legitimate ways out of this mess.

A lot will depend how this changes the cost and user experience of the ball-of-IT-mud (and whether that can be effectively communicated to those who don't follow the latest architecture acronyms).

I attended a Cloud Interoperability Forum in Mountain View yesterday, hosted by Stephen O'Grady from Redmonk and David Berlind from InformationWeek. I roughly counted around 50-60+ in attendance, with a moderate drop off after lunch.

Twitter stream is available under #cloudinterop.

Here are my takeaways, the day after....

Cloud Taxonomy: aka "What we have here is, failure to communicate"

TL;DR version of this post: We think we know what we're talking about when we discuss "cloud computing". We really don't know what we're talking about - there's a lot of confusion, and it's rapidly becoming a marketing term. Thus, a taxonomy would be useful, if we're ever going to foster interoperability or portability.

Out of everything, I think the desire and will to build a taxonomy was the main outcome of the meeting.

Diversity of Clouds

Clouds come in many shapes and sizes. Infrastructure, developer platforms, storage services, etc.

There's a groundswell of "me too" infrastructure-as-a-service cloud plays, and they're the ones that want/need interoperability the most. I worry that this tends to drown out the conversation, and I'm not sure that this is what customers really are after (more on this later). The two Google App Engine guys (Architect & PM) in the room left after lunch, from what I could tell.

Interoperability at a platform level like Google App Engine or Salesforce becomes just like good old data integration - ETL, EAI, SOA, REST, etc. Some in the audience seemed to want to solve this latter problem (which seems, politely, a high hill to climb).

I spoke up and noted that we should try to understand the areas where there is broad agreement, and the areas where there is no broad agreement, and focus on the former. Because otherwise we're just going to wind up with a messy niche. This was echoed by several participants.

Even with areas with broad agreement we're going to have a lot of work to do, weighing existing standards against their old assumptions which may (or may not) apply. For example, "Cloud Storage" was brought up as an area in need of standardization. But, at what level? Management, provisioning, monitoring, etc? Should it be a high-level API? Or something more like the specs that the SNIA has put out? All of this requires a lot of thought as to the intended audience and the scope of use cases.

Openness, Ideology, and Standards

Bob Sutor, of IBM, stood up to speak to his experience of previous standards efforts. Two points struck me as debatable:

1. "The days of making boatloads of money on locked in technology are gone -- you're not going to get a patent and sit on it."

I agree with this, to some degree, but I think it may be misleading. It's easy to say that "nuclear weapons are no longer effective" when you sit on the largest stockpile of them. IBM has (and continues to collect) the world's largest patent library. And most of their software portfolio is proprietary, and will likely remain so.

No question, open standards and open source implementations are essential, but the issue is figuring out how to balance collaboration, adoption, and the desire to make money by (in part) excluding competitors. "Commercial open source" companies do this by offering proprietary add-ons. Even RedHat does this, by excluding 3rd party distributors from using its trademark. You'll also notice most ISVs certify their software on RHEL, not CentOS... as intended.

2. Bob urged caution in the tale of REST vs. WS-* to avoid ideology in developing cloud standards. Despite misgivings, "A lot of people made a lot of money on WS-*".

Firstly, I respectfully think this is a misunderstanding as to the role of ideology in standards making. That sword cuts both ways - is all I'll say.

Secondly, I think that it short changes the importance of architecture when defining interoperability standards. Do you build a Cloud API? Or a hypermedia format? Or a document exchange protocol? Or a data schema?

These things lead to drastically different market and business results, and depend on decisions made in the first day - so-called "ideological" decisions such as "what's your architecture?". If all you want is cloud providers to use the same API, I'm not even sure that's the main problem. Sure, it helps small providers in a small, burgeoning ecosystem, but I don't think that's what enterprise IT cares about yet, primarily.

IMO, standards bodies are dangerous affairs for small companies. It's rare that they have a tall seat at the table.

Open Implementations vs. Open Standards

A minor bun fight ensued related to the frustrations of market dynamics vs. building software that one can rely on beyond the lifespan of a company, or if the company has a policy you don't like.

Tim Bray noted that there is a visceral fear of lock-in among many of the companies he talks to. "Substitutability is everything". A senior tech executive from IBM noted that "substitutability focuses on a very narrow set of problems though - enterprise IT and CIOs have an integration problem to deal with".

Followed by various comments from the audience:
"The cost of an Oracle maintenance is too much to deal with".
"Yet few are switching away from Oracle for new deployments."
"..."

I suspect Sun's acquisition of MySQL likely has something to do with the above discussion.

I don't really think there will ever be a resolution to substitutability vs. lock-in: it's a fundamental market dynamic that will be played out repeatedly in different ways.

Anyway, it seems we're back to the old nugget of standardizing for Interoperability vs. Portability, something I recall that was the argument for WS-* over EJB back in the late 1990's. EJB supposedly gave you portability, and RMI/IIOP was what gave you interoperability, and it wasn't good enough because it (realistically) preferred Java on both ends. SOAP/XML was language agnostic and, better yet, supposedly "ideologically agnostic", so that VB developers would play as equally as C++ developers and Java developers.

At best these have both been "modest" successes. I would lean towards believing that interoperability as something actually having lasting business impact -- reducing transaction costs. Portability can do that too, but it's much more case-by-case. We really should be careful as to which we prioritize, and in what area.

A second thread of discussion was on how hard it is to build an open standard, and how difficult it is for one to actually gain traction and become successful. One suggested that open source implementations are more effective means of interoperability - because since it is mechanism, it works, it doesn't have to be (badly) interpreted by several organizations.

But this too has problems, which I and several others pointed out:

a) you CAN get locked into open-source software - switching costs are still pretty high, based on how dependent you are. What happens if the project is taken in a direction you don't like? What happens if it doesn't address your needs? Well, you fork.... which leads us to:

b) If there realistically can't be ONE open source project for an area of cloud computing, there likely will be several. That don't interoperate, or aren't portable.

Which leads us back to the need for open standards with (at best) reference implementations.

Interestingly both the Chairman and President of the DMTF were in attendance and were actively trying to foster dialogue, particularly around the need for a cloud taxonomy.

"Identity" and "Trust" are rat holes of epic proportions.

A significant chunk of the meeting was discussing the ability to carry federated identity across cloud providers. I chimed in that I think more important is carrying identity from location to location in one's application.

I know this topic is near and dear to James Urquhart, and I agree that it's crucial for long-run adoption of a multi-provider marketplace. I unfortunately think that reality is quite a long ways off.

But, this is a problem that goes beyond clouds, and I'm not sure this audience was the right one to wrangle with it.

We have plenty of answers, but we aren't asking the right questions, yet.

The audience was largely falling into the trap of being technologists rushing to solutions without thinking through problems and the audience they're targeting.

There was a focus by many to "scratch personal itches". Which is all well and good, but that's what open source projects are for, arguably, not standards bodies.

There were a few comments of a dislike or disinterest in "academic standards" that will try to do too much. I caution that what is academic in one person's eyes is essential in another's. And sometimes people mistake "academic" for "breadth" or "ambition". Are SNMP MIB's academic? They certainly look pointy-headed, until you realize how pervasive they are. How about all the CIM schemas at the DMTF? Aren't they useful? What about OVF? Certainly it doesn't do much today, but I bet they have broader plans for it.

Finally, there wasn't much discussion about what Enterprises or CIOs want, despite the attempts of some audience members. Which to me, is the biggest concern - above the needs of the "cloud ecosystem" of small vendors, or the frustrations developers get when using today's cloud platforms. We need to focus on what businesses actually want out of this technology.

My final presentation for the week is up on SlideShare.

At first I was worried this might be a bit too "out there" , but feedback has indicated the presentation was well received and provocative in the right ways, particularly in getting people working on REST to talk more about areas that have been usually punted, particularly the design of "interaction-oriented" interfaces in hypermedia, using POST.

Great seeing Jim, Steve again, and meeting Leonard and Ian for the first time. Though they left the conference before my session, I also briefly got to meet Tim, and Mark Nottingham, two leaders in this space whom I've long respected.

About this Archive

This page is an archive of recent entries in the Tech category.

Society is the previous category.

Find recent content on the main index or look in the archives to find all content.

About Me
(C) 2003-2010 Stuart Charlton

Blogroll on Bloglines

Disclaimer: All opinions expressed in this blog are my own, and are not necessarily shared by my employer or any other organization I am affiliated with.