Recently in Tech Category

Part of my continued series of reflections on the WS-REST workshop.

It may continue to amaze, but the most basic aspects of interoperability continue to have a common problem: the implementation almost always violates the protocol specs that it supports. And that violation often gets encoded into practice, and back into the spec. Sometimes this is for good reasons, sometimes not.

Sam drew from his experience with early SOAP interoperability to illustrate the problem. Apache SOAP, for example, a descendent of IBM SOAP4J, and the predecessor to Apache Axis, had a history of supporting the early SOAP spec, when it was a vendor-led W3C note. It had encoded data with the SOAP encoding section of the SOAP specification, because the XML Schema data type system was not due to be out until mid 2001.

Of course, early specifications often under-specify, and that leads to implementation leaking through. A classic borderline cases is representing "Infinity". XML Schema suggests that infinity should be represented as the characters "INF". Apache SOAP, predating this, used the java.util.Number class, whose toString() method generated "Infinity".

The fix is simple enough, but there were deployed systems out there already. The solution was to continue to produce "Infinity", but to accept both "INF" and "Infinity". Future SOAP stacks, like Axis, which were built around XSD in mind, both produced and consumed INF.

My first reaction to a lot of this was, "conform to the (damn) specification". If it were only so simple. Sometimes the spec is wrong or incomplete. Sometimes a boundary condition is missed by implementers. And organizational inertia makes it hard to upgrade. So, documenting what is implemented becomes the preference.

None of the above has really any difference with REST. One of the takeways: contracts matter.

I know that there's trepidation about the term "contract" among those in the REST community, in that it may envision a rigid interface. I think it's a price worth paying -- you cannot eliminate the notion of "contract" when interoperability is a concern. Contracts are not about how rigid or flexible the interface is. They are about ensuring complete, understood, and compliant expectations. And, frustratingly, they cannot be well-specified if there isn't any implementation experience.

One might argue that the Web has proven very good at composing contracts together -- transfer protocol, media types, identifiers, etc. It didn't eliminate the painstaking need for agreement, it depends on it. Those interested in philosophy might find this an interesting paradox, that the explosion of freedom & diversity of web content and applications requires strong control at the lower layers -- an idea Alexander Galloway explores in his book, Protocol: how control exists after decentralization. Practically speaking, the areas of the Web where there was lax specification and/or complex implementation compromises are the areas where there's not a lot of diversity - such as browsers, which have to deal with HTML, CSS, JS, etc. Servers haven't been immune from this, but they have somewhat more understandable specs to follow, with HTTP, MIME, URI, etc., though ambiguity with HTTP has led to a dearth of complete HTTP 1.1 deployments (most use a subset). This has led to the work on HTTPbis to clean up and clarify the spec based on over a decade of experience.

Considering today's new generation of RESTful APIs on social media & cloud platforms, there are some curious trends. JSON-based RESTful services, for example, do not specify dates, or formal decimals. XML (schemas) do have standardized simple types, but the syntax and infoset is off-putting due to its lack of fidelity with popular programming languages. Which is better for interoperability? Maybe the answer is because JSON is so easy to parse, it's ok that each endpoint needs to adapt to a particular set of quirks, like differing date/time formats. But that sounds like a very developer-centric argument, something that serves the lazy side of the brain. We can do better.

Part of my continuing series of reflections on the REST workshop at WWW 2010...

Sam Ruby's keynote began with a call for technologists to back off the extreme positions often taken in technical arguments (likely because he's knee deep in the middle of the mother of all technical standards debates). Using Mark Pilgrim's parable of the web services spec developer fighting to the death over Unicode Normalization Forms, he illustrates that technical advocacy can lead to knee jerk, us vs. them reactions on very minute details, especially when different mindsets collide and have to work out a path to interoperability.

Relating this to the "REST community" (whatever that means), I think we seem to have a built-in defensive posture, and aren't quite used to being in the driver's seat for innovation. There were so many attempts to "fix" the Web into some other architecture: HTTP-ng tried with a distributed object model, the Web Services crowd tried again. The slow counter-revolution of "using the Web" was largely grown out of Mark Baker's health and savings account. And now, REST is now a technical pop culture buzzword, being perverted in meaning as all buzzwords are prone to. Roy's gotta defend the meaning of the word if it's to mean anything. So, in a way, it's completely understandable why RESTafarians honed a defence mechanism against misleading arguments that they fought ad nauseam.

Unfortunately, today, this defence mechanism seems to have limited free discussion of the limitations of REST, the areas of useful extension, the design of media types, and the ability to relate or objectively discuss extensions to REST that may evaluate the benefits of decades-old ideas like transactions, reliable messaging, or content-level security, without being patronizing ("you don't need transactions", or "reliable messaging isn't really reliable", "use SSL, it's easier", etc.). Researchers and standards working groups try to follow the style, or publish extensions, but anticipate the dreaded Roy bitchslap, perhaps because there seems to be a lack of resources (tools, documentation, frameworks, examples in different use cases) to learn how to apply the style, or perhaps because people are being lazy in applying the style (glomming onto a buzzword). Hard to tell.

In short, this is a hard problem to resolve, as there are good reasons for people's strong opinions. Compromise can only be achieved if there's shared understanding of the facts & shared values. Mostly I see shared values in terms of the properties that people find desirable in a networked system (interoperability, extensibility, visibility, etc.), we still seem to be missing a shared understanding of the facts (such as the basic idea of "emergent properties" in software architecture, and the idea that they can be induced through constraints -- which I think many still reject.)

To be fair, there have been plenty of extensions out there, include the ARRESTED thesis from Rohit Khare, various public RESTful APIs and frameworks out there, Henrik Nielsen's use of RESTful SOAP services in Microsoft Robotics studio, etc.

Look at the reference to the latter, for example. There are plenty of "specific methods" being defined while fitting under the POST semantics. Some would have a problem with that, but I'm not sure what the alternative would be. There are "state replacement" methods under the PUT semantics if you prefer that. But some clients may prefer the more specific, precision state transfer of a POST method. Would you consider this heresy? Or would you stop and realize that Henrik was in the front row of the WS-REST workshop, and co-authored too many W3C notes and IETF RFC's to enumerate here?

I'll speak more about this problem with the "write side of the web" soon, but in short, we don't really have a good way of describing the pre-conditions, post-conditions and side-effects of a POST in a media type, which would be probably the more "RESTful" thing to do. Mark Baker took a stab at this 8 years ago, but there's hasn't been much pickup. In lieu of that, the simplest thing might be to make an engineering decision, slap a method name & documentation on the URI+POST interaction, and move on. You'd be no worse than a WSDL-based Web Service, with the added benefit of GET and URIs. Or, depending on your use case, it may mean crafting a full media type for your system, though I'm not sure that's "simple".

This dilemma is why I tend not to have a problem with description languages like WSDL 2.0 or WADL -- they're not complete, certainly, but they're a start to wrapping our heads around how to describe actions on the Web without breaking uniformity.

Most of those I spoke to at WS-REST felt was that media types, tools, and frameworks for REST have been growing slowly and steadily since 2008, but lately there's been a major uptick in activity: HTML5 is giving browsers and web applications a makeover, OAuth is an RFC, several Atom extensions have grown, Facebook has adopted RDFa, Jersey is looking at Hypermedia support, etc.

But, coming out of the WS-* vs. REST debates, it is interesting in noting what hasn't happened the past two years, since the W3C Workshop on the Web of Services, which was a call to bridge-building. As expected, SOAP web services haven't really grown on the public-facing Internet. And, also as expected, RESTful services have proliferated on the Internet, though with varying degrees of quality. What hasn't happened is any real investment in REST on the part of traditional middleware vendors in their servers and toolsets. Enterprises continue to mostly build and consume WSDL-based services, with REST being the exception. There has been, at best, "checkbox" level features for REST in toolkits ("Oh, ok, we'll enable PUT and DELETE methods"). Most enterprise developers still, wrongly, view REST as "a way of doing CRUD operations over HTTP". The desire to apply REST to traditional integration and enterprise use- cases has remained somewhat elusive and limited in scope.

Why was this?

Some might say that REST is a fraud and can't be applied to enterprise integration use cases. I've seen the counter-evidence, and obviously disagree with that assessment, though I think that REST-in-practice (tools, documentation, frameworks) could be improved on quite a lot for the mindset and skills of enterprise developers & architects. In particular, the data integration side of the Web seems confused, with the common practice being "make it really easy to perform the required neurosurgery" with JSON. I still hold out hope for reducing the total amount of point-to-point data mapping & transformation through SemWeb technologies like OWL or SKOS.

Perhaps it was the bad taste left from SOA, or a consequence of the recession. Independently of the REST vs. WS debate, SOA was oversold and under-performed in the market place. Money was made, certainly, but no where near as much as BEA, IBM, TIBCO, or Oracle wanted, and certainly not enough for most Web Services-era middleware startups to thrive independently. It's kind of hard to spend money chasing a "new technology" after spending so much money on the SOAP stack (and seeing many of the advanced specs remain largely unimplemented and/or unused, similar to CORBA services in the 90's).

Or, maybe it was just the consequence of the REST community being a bunch of disjointed mailing list gadflys, standards committee wonks, and bloggers, who all decided to get back to "building stuff" after the AtomPub release, and haven't really had the time or inclination to build a "stack", as vendors are prone to.

Regardless the reason, the recent uptick in activity hasn't come from the enterprise or the enterprise software vendors offering a new stack. The nascent industries that have invested in REST are social computing, exemplified by Twitter, Facebook, etc. and cloud computing, with vCloud, OCCI and the Amazon S3 or EC2 APIs leading the way.

The result has been a number of uneven media types, use of generic formats (application/xml or application/json), mixed success in actually building hypermedia instead of "JSON CRUD over HTTP", and a proliferation of toolkits in JavaScript, Java, Ruby, Python, etc.

We're going to be living with the results of this first generation of HTTP Data APIs for quite some time. It's time to apply our lessons learned into a new generation of tools and libraries.

WS-REST Workshop Themes

| No Comments | 1 TrackBack

The First Workshop on RESTful Design (WS-REST, hur hur) was held at WWW 2010 in Raleigh last week. I attended and was also on the program committee. It was great to finally meet a number of folks from rest-discuss and the #rest IRC channel, along with seeing again folks working to improve REST in the trenches.

This turned into the mother of all blog posts, so I've chunked it up into a series. The tl;dr version of my workshop takeaways are the following:

- Investment in RESTful technology is growing, particularly with cloud and social computing. Enterprise and vendor investment has been low.

- Technical disagreements and arguments from authority are counter-productive, and seem to be ubiquitous. Now is the time to build stuff, and make stuff interoperate, not split hairs. On the other hand....

- In building interoperable systems, sometimes domain experts are wrong, sometimes they're right. Sometimes it's better to document and standardize what works, sometimes that's not good enough.

In short, interoperability is hard, REST doesn't change that, better get a good helmet if you're in a group specification or standards body these days. The most important thing is if you're going to willfully violate someone else's idea of "the Right Thing", then be explicit and honest about it, so that it can be discussed. (This is, in part, the saga of the HTML5 standards effort.)

- REST has become a buzzword among those that aren't architects and don't want to be architects. That means there needs to be better documentation, tools, and frameworks; most developers can't derive the right design from first principles. Most published RESTful APIs published today are not as interoperable as they could be; there will be a price to pay to maintain these.

- Having said that, for architects, progress has been slow in driving a true discipline for architecture, though there is some great work afoot to make this easier

- Hypermedia tends to be a somewhat jarring user experience to those that still love their desktop applications, especially when we build small protocols like OAuth , but fall back on HTML to do other pieces like authentication and authorization policy selection.

- Developers tend to love dynamic binding nature of REST so long as they're not the ones that have to maintain the client. We need a better way of handling dynamic binding (my suggestion is a work in progress) to hypermedia.

- It's high time to work on the "write side" of the web. We know that HTTP GET works well for interoperability. We have had less success with write actions, and need to think through interoperability there to a much greater degree than we have. Enterprise
integration use cases, for example, could really benefit from describing POST interfaces for domain application protocols. The best we have is WSDL, WADL, and generic forms like XForms, HTML Forms, and generic publishing containers like AtomPub. We can do better.

- You can use REST in the enterprise, for legacy integration, successfully.

- Peer to Peer systems quite achievable in RESTful systems. The nature of hypermedia itself is peer-to-peer. And minor extensions to HTTP can lead to full data propagation style systems.

- Progress is being made to formally describe a RESTful Semantic Web, one that leverages the formal descriptions of tuple spaces with the polyadic pi-calculus and extends them to RESTful architectures.

- While HTML 5 is progressing to be a rich platform for the next generation web, there's experimentation happening for alternatives. Some drive traditional desktop UIs through RESTful resources, others are describing mixed reality worlds.

I have some forthcoming blog posts musing on these themes.

Reblog this post [with Zemanta]

Building a hypermedia-aware client is rather different from building a typical client in a client/server system. It may not be immediately intuitive. But, I believe the notions are rooted in (quite literally) decades of experience in other computing domains that are agent-oriented. Game behaviour engines, control systems, reactive or event-driven systems all have been developed with this programming approach in mind.

The normal way we build clients, in a client/server architecture, looks something like this:
cs-programming-model.png


The logic of the application - its objectives and how it wants to achieve them through one or more remove services, is often procedural. A rich OO domain model is sometimes preferred to procedural logic, but this isn't usually used in conjunction with remote services because of the latency involved; a service facade coalesces communication into coarse grained interactions.

This idea of a service facade culminates in SOA, where interfaces, along with all their possible message exchange patterns, are registered for others to lookup:

soa-programming-model.png

A agent-oriented client, on the other hand, looks something this (which I've adapted from Russell & Norvig's diagram):

agent-model.png

The application agent has several pools of pre-defined logic:

a) Application Logic: some logic for the application itself (e.g. the basic states of a hypermedia application, the goals of the application if it has any. A browser has no goals other than rendering; whereas a product ordering & payment agent would have the goal of completing e-commerce transactions on behalf of a user)

b) Action Logic: some logic for the implications of actions (e.g. how does a payment & product ordering agent know, interpret, or infer that PUT/POST/DELETEing to a particular sequence of URIs will result in a paid product order)?

c) Protocol Logic: some built-in logic for handling protocols & media types (e.g. URI, MIME, HTTP, and maybe some mix of HTML, Atom, Atompub, etc.).

The problem of bridging together application and action logic together is known as Action Selection. Action selection doesn't require fancy algorithms. Its study has often dealt with complex subject matter, which has often lead to complex solutions. But in most agents, the bread and butter for action selection is simple: the Finite State Machine (FSM). An agent responds to changes in the environment based on its current state and a set of known transitions. There are other approaches to agent programming that are growing in popularity, like planners, but let's start with FSMs.

Firstly, an agent's application logic requires a state machine to describe the relationship between sensing ("safe") actions and changing ("unsafe") actions. In a hypermedia application it looks something like this:

state-machines.png

This basic hypermedia application state machine is sandwiched hierarchically between the super-state machine for the application's goals and the sub-state machines for the protocols:

state-sandwich.png


The trick with building a RESTful agent based on FSMs would be to figure out a way such that

a) The application's goals can be expressed in terms of hypermedia agent logic (e.g. sensing & effecting)

b) The hypermedia types and link relations themselves contains enough interpretable action logic that can be mapped to the application's domain

c) The action and protocol state machines are modular. RESTful applications tend to have standardized and relatively small number of generic protocols, so they need to be repurposed for different applications and/or contexts.

Two ways of accomplishing this include hierarchical FSMs and behaviour trees.

Hierarchical FSMs are popular in control systems and game engines. They are great for reactive systems, where the correct interpretation & response to input and events is the intent of the application. Managing call control, or a climate control system are examples. There are powerful generic Hierarchical FSM standards out there like SCXML that provide a code-on-demand approach to interpreting and managing states across a set of resources (though it probably could use some RESTful-friendly polish).

Behaviour trees have the same power as hierarchical FSMs, but tend to be more oriented towards goal-based applications, where the purpose of the application is to transition a bunch of resource state to some new state. For example, a calendar scheduling agent, or a payment & ordering agent, are examples of goal-oriented agents.

In future, I'm going to explore how to build a behaviour tree-based agent; probably for the Restbucks domain that Jim Webber, Savas Parastatidis, and Ian Robinson have been using for the past year or so and including in their "REST in Practice" book.

Reblog this post [with Zemanta]

Update: Comments should be working now.

This is my attempt to summarize an overview of my thinking on RESTful versioning. It's a follow up to Square Peg, REST hole. These concepts can be tricky concepts to describe, and I don't really want to write a small book on this topic, so I may get some of this wrong. Thus, expect updates to this entry to improve it in the future.

Data Versioning vs. Language Versioning

Extensibility and versioning in RESTful services can be viewed in terms of two domains of agreement. The two domains are: resource and representation, which could also be thought of as the "data" vs. "language" domains.

First, let's recall what a resource is: a time varying membership function, where the members are instances of a representation at various points in time. The resource can return different values at different times. BUT resources can be narrowed down into very specific semantics, if resource owner wishes. A resource might be "the most recent version" of a record, whose state might change often, or it might be a "specific version" of a record, and thus unchanging in state. These are two different resources, even though they may have the same representation for a period of time. A resource may even contain format metadata and constrain the language emitted, though content negotiation may be preferred.

Regardless of how often the values change, the semantics of the resource should not change. "Revision 3 of purchase order 123" should retain that meaning. If they do change the meaning, it hurts consumers that relied on the old meaning.

When we think of URI versioning, this is a design choice when resources are immutable across time and we create new resources for state changes (similar to how we manage time-series data in a database).

With language extension or versioning, on the other hand, the state is unchanged, but the way that data is represented has changed.

On Language Versioning

Rule #1: Prefer to extend a language in a forwards and/or backwards compatibile manner. Version indicators are a last resort, to denote incompatible changes.

Extension, of course, requires thought. It implies well-specified interpretation policies for language consumers, and in the case of a machine-readable schema, well-specified extension points. But the range of choices aren't too hard to understand.

This table summarizes the current techniques in practice for extensible or versioned languages, using the terminology from the W3C TAG's draft versioning compatibility strategies document, by David Orchard, which I'm going to butcher through my own brief summaries.

 ConsumerProducer
Backwards-Compatible
  • Lookup version notifications
  • Replacement or Side-by-side
  • Version notification via out-of-band channel or links
Forwards-Compatible
  • Must accept unknowns
  • Must preserve unknowns if persisting state
  • Version identifier substitution model
  • Media type specification clearly defines consumer forward compatibility expectations (and/or uses a machine-readable schema to denote forward-compatibility extension areas)
Incompatible
  • Check for version identifier
  • Side-by-side or Breaking Replacement

Some explanations...

Version Notifications

Agents should be notified of new versions. This can be done out-of-band (email, physical letter, carrier pigeon), but it helps to complement this with links. These links could be an extended, and agreed to link relation, and/or as part of the media type specification. The links may point to a description of the version change, or, in the case of a Side-by-Side, the URI that emits the resource in the new language version.

Replacement

This implies that origin server is replaced by a new backwards-compatible version that is able to accept both old versions and new versions of representations sent by a client (usually via a POST link). This is useful in combination with a forward compatible change -- none of the links need to change.

Side by Side

This implies that the origin server provides a new MIME type or URI-space for resources using the new language, along side old resources using the old language. In either case, you are impyling "this language changes everything". In the case of changing URIs to reflect the new language verison, in effect, you're using "resource versioning", something usually relegated to storing time series data , as a means to work around your language compatibility problems.

To make this RESTful, your media type must include a link from the old resources to their new version, along with metadata indicating the version of the language used at the URI, possibly including a link to a machine-readable schema of the new version (if your media type has such a thing, like XML with Relax NG or XSD). In the case of a new MIME type, you would want a link relation that notes an alternate format is available.

Let me underscore this: You cannot expect clients to understand your URI format and swap out all occurrences of "v1" with "v2"., if you do, you're placing a heavy burden of coupling on your client, that YOUR SERVER is so special, that they need to understand YOUR URI format. This is completely antithetical to why we would want to use REST in the first place, unless you're really just tunnelling XML over HTTP for the heck of it. I note that many "REST APIs" out there actually are built this way, which means they're just as point-to-point coupled as other interface styles.

Must Accept Unknowns

If the consumer sees elements in the data it doesn't recognize, it still accepts the representation. Generally, it ignores these elements for processing.

Must preserve unknowns if persisting state

This is an optional follow-on from "Must accept unknowns", and is often forgotten. If representation state is being persisted (i.e. cached) in the consumer's domain for later use, the unrecognized elements should be preserved, and not stripped. This could greatly assist forward compatibility when the client is upgraded to handle the previously unrecognized elements.

Version identifier substitution model

I defer to Section 5.3 of the compatibility strategies document.

Where do you place the version identifier?

In order of preference:


  1. In the media type content

  2. In the MIME type itself, or as a MIME type parameters

  3. In the URI

Version identifier inside the media type content

This has many examples in the wild, such as HTML DOCTYPE, some uses of XMLNS, a version identifier inside your PDF document.

This requires the replacement model for backwards-compatibility, and encourages the greater use of forwards compatibility. It's the way that most web media types have long worked, with varying degrees of success, but note that those formats were long designed with forward compatibility in mind.

It's still possible to combine this approach into side-by-side versioning if need be, especially if you are changing the semantics of your resources.

Version identifier in the MIME type

e.g. application/vnd.mytype;version=2

This is currently a non-standard and debatable technique. The benefit here is that this enables side-by-side versioning without impacting the URI-space. On the other hand, this reeks of avoiding hypermedia and trying to push things to the other layers of the Web Architecture (HTTP and/or URIs). But in many cases this is preferable to a new URI space.

Version identifier in the URI

e.g. http://example.com/data/v1/po/123

I described the primary problem here earlier: you can't assume you are a special snowflake and the client will know that 'v1' is your magic crystal. You must provide a link or a URI template in the media itself (and/or in a service resource) to denote new versions.

The secondary problem is bookmarks, or inbound hyperlinks. In a database system these are known as "foreign keys". Anyone with a relational data background knows that their primary keys really shouldn't change, as it's expensive to propagate that change to foreign keys.

There is, however, one case, where this approach is preferred over the others. This ties back to the beginning of this entry, when I discussed "Resource Versioning". It's clear we mint URIs when the semantics of the resource itself changes. So, if they change with the language, then mint new URIs -- using hypermedia, if possible, to link old concepts to new ones, as this requires a side-by-side compatibility approach.

For example, if we have an Account resource, and a new version of our resources and language we are deprecating the notion of account, and adding two new resources, "Customer" and "Agreement". It makes no sense to preserve the Account URIs for new Customer resources in this case, as the changed meaning would be confusing to clients expecting an Account.

Some Q&A

Aren't bookmarks the problem? Wouldn't life be better if we rejected bookmarked URIs?

Well, yes, they're a problem, but no, life would suck if we rejected bookmarks, because there's no different between a hyperlink and a bookmark. It would be like saying "no one can hyperlink to me", which is absurd.

Wouldn't versioning be simpler if we separated access from identification, like with WSDL services?

If my data identifiers become opaque primary keys like 123 instead of http://example.org/po/123, then they're tightly coupled to the service that produced the document, as it would be the only context in which I could resolve details for that identifier. Now clearly one benefit is, if I create a new incompatible side-by-side service version, technically (assuming I don't need to re-key my database), the stored foreign keys don't change.

In a RESTful approach, URIs are your "foreign keys", and if you embed a version identifier in them, they need to change when you upgrade to the next version if you embed those versions in the URI. Assuming you can't convince your resource owners to use languages with version identifiers as a MIME parameter or inside the language itself, how is that done?

In a word, lazily.

As I've discussed above, your media type should have an extensibility section or link relation(s) that points to the new version. And upon retiring a language at a particular URI, you would use a permanent redirect (301) to tell all consumers to update their bookmarks / foreign keys. In either case, the agent would have the ability to update their persistent reference.

Again, this is a special case -- there really shouldn't be that many incompatible versions, they should be forward-compatible changes that dont' require new URIs unless you're completely mucking with the resource semantics.

In Summary

  1. Prefer extensible, forwards & backwards compatible languages and the replacement approach to compatibility. Note the W3C TAG's position on version identifiers
  2. Be judicious when you use version identifiers in URIs, as cool URIs don't change
  3. For side-by-side deployments, always include a section in your media, or link relation(s), to point to new/old versions, and update references lazily as the consumer refreshes its cached value. Use permanent redirects to retire URIs bound to old language versions.
  4. Version URIs if the semantics of the resource changed, but be courteous to consumers by ensuring links are available to denote the old vs. new alternates
  5. Chapter 13 of Subbu's wonderful new book RESTful Web Services Cookbook provides more detailed illustrations of several versioning techniques.
Reblog this post [with Zemanta]

I need home for a REST

| No Comments | No TrackBacks


Time to dust off my microphone and bring up a couple of topics on REST and the Web Architecture
- Versioning, or "Cool URIs don't change -- but my data format does!"
- Why the Web Architecture could use a Programming Model for the Enterprise

Which I'll try to get to this weekend.

For now, I leave you with two things:

First, reflecting on Wiliams' recent Square Peg, REST Hole, I draw from the archives, What are the benefits of WS-* or proprietary services?

Let's keep our eye on the prize. REST is a style aimed at extensible, low entry-barrier, multi-organizational, confederate information sharing and communication. I note that most IT organizations are confederacies, adopting a federal or feudal governance model.

The Web Architecture itself (MIME types, HTTP, URIs) provides a much-needed stable intermediate form for interoperability among many different systems and applications -- something that an general-purpose orientation, like SOA, doesn't really provide. Or, fitting for RESTafarians, it is a shared hallucination ;-)

Not every system, or layer of an enterprise's architecture, has the same requirements for scalability or interoperability. The post from 2007 highlights such examples.

Secondly, the song, which ruled my college years in Canada....

Reblog this post [with Zemanta]

The Trouble with NoSQL

| No Comments | No TrackBacks

I have an ambivalent feeling towards this NoSQL trend, on a few levels.

a) "RDBMS don't perform or scale".

I've seen this in presentations, blog posts, and even in the Hadoop O'Reilly book. I'm not sure if this is sloppiness, ignorance or plain dishonesty. Anyone paying attention, it's pretty clear that RDBMS do perform and scale: there are several 1+ petabyte Teradata implementations, Oracle RAC is used heavily at Amazon.com (70 TB) and Yahoo! (250 TB), for example. Of course, this is about scalability in terms of data volume, and huge queries. On the OLTP side, the TPC benchmark continues to show Oracle and DB2 are able to pull out staggering numbers both in classic SMP and in clustered configurations (yes, DB2 can do shared nothing).

This is not to say RDBMS are the solution to all data persistence problems. I'm an old object database guy and there were (and are) many reasons why one would use that (or one of the newer scalable key/value stores like Cassandra). But, please present the technology on its merits, not based on completely misleading claims.

One almost gets the impression that "If it's not open source, it doesn't exist", which is absurd considering the billions Oracle, IBM, Microsoft and Sybase continue to rake in.

b) "In the CAP tradeoffs, availability > consistency, almost always"

Except when you're running financial analyses. Regulators don't like "eventually consistent" accounting statements. Even when you have terabytes of them to go through.

To me, the best approach would be to provide developers and data architects a knob to adjust the level of consistency vs. availability vs. partition-tolerance depending on the circumstance (the query, the data, etc.)

c) What happened to "Data Management"?

I'd be willing to sacrifice the "R" in RDBMS for certain reasons, but I'm less interested in sacrificing the "MS" part, i.e. "Management System".

There's an eternal battle between those that want the data intertwined with the code and those that want the data separate from the code. I grew up thinking the former, and learned to appreciate the latter.

Every generation of programmers seem to go through this phase where the next-gen persistence engine becomes all the rage. From CODASYL to ODBMS to XML databases to Object caches, and now key/value stores or "cloud databases".

Managing data, scale, and partitioning in the application is a workaround, not a very pleasing solution. I understand people have to get their jobs done, but enterprises seem to have different data management requirements than young companies. Most cases the data exists to support business operations or business decisions. Quality is paramount, and poor management leads to data duplication and mistakes when other applications need to access that data. One tends not to notice these problems early in the life of an application, it tends to be something that occurs across applications that integrate with one another over time.

Similarly, "schema-less" data persistence is only beneficial early in development stages. Later on it becomes pretty useful, and over time, it's almost essential if you want to reuse or repurpose that data and be able to interpret it consistently without having to crack open the supporting codebase.

And strong DBAs have a unique perspective on data & performance management, one I've found lacking in many a programmer (with the ones at Google being a notable exception. They truly seem to have instilled the advanced DBA's sensibility into its engineering work).

d) Are you really sure that SQL is the problem?

I can agree that many of the cheaper (or free) RDBMS don't scale well. But why do people think SQL is the reason that they don't scale? It seems like conflating logical with physical issues. The traditional SQL RDBMS model may not be the only way to do logical data management, but relying on programmatic solutions and ad hoc query languages certainly isn't very satisfying, it seems all very 1970's. Throwing out logical data design & management implies horrible long-term consequences on data quality and correct modelling of a business domain.

On this note, there's a new paper contrasting Hadoop with parallel databases for large-scale data analysis tasks (written in part by the Vertica guys - Mike Stonebraker's new company). The conclusions are interesting -- Hadoop isn't the clear performance leader, but it certainly wins major points for being simpler to get going than your traditional DBMS. On the other hand, specifying SQL statements looks quite compelling vs. writing a bunch of map and reduce functions. And the results show these SQL databases certainly perform well on query (assuming you can load them fast enough). Another paper, by some of the same authors, looks at combining Hadoop with a parallel DBMS (probably Vertica) with encouraging results.

Having lived through the object database wars, and watch my beloved databases get trampled into a niche, my sense is this:

- The IT world didn't reboot when clouds came out, there's a lot of assumptions worth challenging, but lots to learn from history.

- There's a real chance most NoSQL solutions will remain niches while RDBMS continue to dominate, because customers will force their vendors to scale them out. The real question is whether this is an impossibility due to the rational model or SQL. I'd say that's highly doubtful. They'll find a way, if their customers pressure them.

- On the other hand, there's a real chance for a NoSQL alternative to make it big and succeed IF it evolves to be a true DBMS, not just a persistence engine, provides adjustable CAP tradeoffs in its interface, and offers us a worthy successor to SQL.

I've uploaded my position paper for the OOPSLA 2009 Cloud Design Workshop next week. This provides a detailed technical overview of what Elastra has been working on for the past year.

Cloud Computing has been a catalyst that has been accelerating a long-needed convergence between IT Operations and Application Architecture. We need to build systems to be operated, managed, and governed -- not as an afterthought. And we need better collaboration between IT specialists. Through a mix of web architecture, and a dose of autonomic computing, and we may have the beginnings of a new inter-cloud architecture. It feels like the end of a marathon, but we've only reached the first checkpoint.

There are at least six views on Cloud Computing out there, and why they're important. Some people are pretty adamant that their definition is the one true definition, others tend to admit the overlap. Optimists would call this state of affairs "synergy", pessimists would call "vagueness", cynics would call it "sophistry".


I'd like to distill, briefly, how I see things.

1. Theme: Scale without skill, Availablility without avarice

Why Cloud? "Don't worry about Scale or Availability, SuperCloudPlatform Will Take Care of It"
Do: Adopt a Cloud Platform, like Google App Engine, Azure, or Force.com
Don't: Worry about Infrastructure as a service, that's so .... 2006.
Laugh Nervously About: The Magic Architecture & Buzzword Bingo required to make this work. Also, all those PaaS APIs seem rather proprietary....

2. Gimme an A! A! S!

Why Cloud? "Consuming IT as an On-Demand Service instead of as a capital intensive product"
Do: Build out your cloud architecture, with its various layers, and invest in software & services at each layer.
Don't: Get locked into anyone's narrow concept of a cloud. PaaS, IaaS, some SaaS, etc., are all contenders.
Laugh Nervously About: That, as with SOA, everything is a cloud; that you can't buy a cloud, yet everyone seems to be trying to sell you one.

3. Efficiency through Outsourcing

Why Cloud? 1. "Owning your computers is as passe as owning your own energy generator" 2. "Do more with less"
Do: Find one or more strategic cloud partners and begin piilot outsourcing
Don't: Buy more hardware or software to use the cloud. It's snake oil.
Laugh Nervously About: The observation that outsourcing has been a panacea for IT's woes for over 15 years, and last we saw, it seemed like a shell game.

4. Efficiency through Consolidation

Why Cloud? "Your DC's Power, Thermal Hardware utilization are awful; you really could improve that. Virtualization was the start, this is the next step"
Do: Buy Cloud Management & Data Center Automation software, use a Cloud Services partner/SI, keep maturing your use of virtualization.
Don't: Really jump into Cloud Definition #3 until your own house is in order.
Laugh Nervously About: The extra software you're expected to buy, and that it seems to require extra hardware too. "Won't Get Fooled Again" by The Who seems like an apt theme song, particularly the final verse.


5. Process Networks

Why Cloud? 1. "The next-generation of the Internet that will tie together process specialization, information integration, social networking, and contextual data" 2. this is sort of where the "Web Services" vision, circa 2002, left off, after which time they made some poor investments in personal hygiene protocols and associated chicanery.
Do: Meditate on the Zen nature of this future evolution of the Internet. Sign on to Twitter. Attend lots of conferences with "2.0" in the title. Maybe buy a BPM tool, or invest in some Strategic Cloud Consulting Services. Clouds #1, #2, #3, and #4 may be useful on the path to nirvana.
Don't: Worry too much about technical details, it's all about your business anyway.
Laugh Nervously About: 1. That no one knows what the fuck these people are talking about, even though there's probably something interesting happening here. 2. That the paint is still wet on BPM vendors renaming themselves Cloud companies.

6. The Rise of Lean IT


Why Cloud? "Reduced lead times to enabling change in your IT environment, thus driving greater business value"
Do: Start redesigning your IT processes. ITIL v3 ain't bad, if you take it with a grain of salt. Pick up some IT automation and management software while you're at it.
Don't: Think that technology alone will solve your problems, this is mostly about organization & culture, baby.
Laugh Nervously About: 1. That the primary industries that have embraced Lean concepts are Automotive and Telecommunications, and the telcos have been talking about it for 10+ years with little sign they're really serious about it. 2. Agile/Lean proponents tend to be backed by a posse of folks that like to write manifestos.

About this Archive

This page is an archive of recent entries in the Tech category.

Society is the previous category.

Find recent content on the main index or look in the archives to find all content.

About Me
(C) 2003-2010 Stuart Charlton

Blogroll on Bloglines

Disclaimer: All opinions expressed in this blog are my own, and are not necessarily shared by my employer or any other organization I am affiliated with.