Recently in Tech Category

The trouble with APIs

| 6 Comments | No TrackBacks

I posted a couple of thoughts on Twitter and a few comments elsewhere recently as a results of a discussion on George Reese's blog and William's followup. I thought I'd repost these thoughts here with annotation as it seems to be evolving an interesting discussion.

1. REST APIs are a case of good intentions run amok. The point of REST was to *eliminate* specific APIs. Not cause their proliferation.

2. When someone says you shouldn't document a REST API, they're saying you shouldn't have an API. Not that you shouldn't document.

I mean by this, Network APIs. I also mean APIs that aren't re-implementable across sites. The point is that the uniform interface isn't just HTTP - it's a small number of evolving media types, and the goal is to enable sharing of information at a global scale. Your pet JSON format may not be the way to do this.

3. That said, RESTians saying you shouldn't have an API are somewhat vacuous because they offer no good alternative. Custom media types ain't.

4. Thus, IMO, until we have a workable generic media type or two for the 80% case of "REST APIs", we're just growing a frustrating mess.

In other words, I completely recognize that crafting a media type is a bit heady, and even still a custom media type for generic use is quite hard. With a REST API at least you get the benefits of HTTP GET (to some degree). It does the job, if all you want is a trendy vaguely hopeful way of building something for the long term without getting too bogged down in philosophy. I get it.

What I would like to see is a generic media type (above & beyond a "format" like JSON) to cover the 80/20 cases. This is seemingly contrary to where many RESTheads are at - they prefer often prefer opinionated media types, media types that "say something".

I'm actually asking for that too, I think -- say something about the programming model that breaks people's habits away from RPCs. Find a programming model that doesn't orient all "write-side" communication in terms of a flat list of actions, and into something that can be extended & shared across the web via hyperlinks and URIs. Hierarchical statecharts is my stab at this. There may be others.

5. From William's blog (with a couple tweaks)....

I've sort of half given up on REST advocacy because it's pretty impossible these days. People want to ship APIs that will be valid for the next next 6 months, not build systems that will evolve over a decade or more.

Jan Algermissen's comment about how when you've documented an API, you're not really doing REST, is actually spot on, fundamentally, but is met with ridicule. I can completely understand how vacuous the comment sounds if you just want to ship an API nownownow, are being short-term practically minded, or are just playing buzzword bingo with architectural styles. But if we actually want to leverage the benefits of the style, we'd work on the core issue of having a generic media type or two for these sorts of use cases.

If one is thinking of "how to methods map to my URI structures", we've basically lost the plot as to why we have HTTP, URIs and media types in the first place. The goal is to *eliminate* the API as we normally think of it. (It's not about a library of calls over a network).

Five points. The consequences of REST changes some rules:

1. The assumption can't be that the end user of your information service is going to be a programmer. Building an API bakes that assumption in how you expose your information or controls to manipulate it. The assumption rather should be that "this can be any agent of any sort". If your goal really is "I want a programmer to learn my API and program against this", then there's really no reason to use REST other than that it's trendy (and HTTP GET).

2. The assumption can't be that you will be the only site implementing your interface. The whole point of the Web is sharing information - publishing an API bakes in the assumption that YOU, the publisher, are in control. It is the consumer, on the other hand - the agent itself, that's in control of how/what the information is consumed or manipulated.

3. The semantics of the interface that need to be documented are the media type that you use to describe your data and links. The structure of your URIs really shouldn't be a part of said semantics.

4. But doing this in a custom media type implies #2, that you're the only person implementing your interface. Of course, every media type has to start somewhere, but it really doesn't help matters unless you understand that others may be *implementing* your interface, not just *talking* to it. Facebook's Open Graph, for example, is an example of a custom overlay media type that takes great pains to be generic in either direction.

5. All of the above is really heady stuff that is above most developer's pay grades, so it won't ever happen. The best we can hope for is some person/organization to step up with a generic media type or two + guidelines that help describe and document a Web of Data interface for accessibility by programmers. Call it a description language, if you'd like. Some RESTians don't like the idea of a description language. I think it's an essential 80/20 piece if developers are going to actually *use* the architecture the way it was meant to be used.

Currently (as I mentioned in my WS-REST keynote) I think some kind of hierarchical lifecycle/state machine might be the right abstraction for such an interface. I'd like to demonstrate this if I ever got past my day job sucking up most of my time. (But I may have an opportunity to work on this there soon.)

Slides for my keynote, "I'll See you on the Write Side of the Web" at the 2nd International Workshop for RESTful Design, WWW 2011 in Hyderabad, India are available here.

Recording (75 minutes) is now available at this link. (It's an M4A / AAC file, sorry).

I'll eventually post my thoughts on the workshop along with feedback & discussion from this keynote.

WS-REST 2011 Keynote

| No Comments | No TrackBacks

I will be giving the keynote at the 2nd International Workshop on RESTful Design, entitled: I'll See You on the Write Side of the Web.

Yes, I do wonder if I have Brain Damage some days ;)

Here's the abstract:

While global-scale digital information sharing has become synonymous with REST,
state manipulation remains an inconsistent experience. Many so-called "RESTful
APIs" are specified out of band, through documentation, rather than in
hypermedia. Developers struggle with constructing their own custom
hypermedia-driven media types, falling back on lightweight site-specific data
structures. And the few standardized publishing media types, such as the Atom
Publishing Protocol, have had limited adoption. All of this serves to undermine
the serendipitous reuse of "writes", a quality we take for granted when dealing
with "reads". Of course, state-changing actions are semantically much more
difficult to describe generically than a simple HTTP GET, and it is an open
question as to whether it's even possible to do so practically.

This talk will explore whether there's an opportunity to provide generalized
media types and programming frameworks to assist with resource state
manipulation, and what they might look like, drawing inspiration from other
computing domains with dynamic environments, such as video games, robotics, and
embedded systems controllers.

Slides will be posted around the same time as the keynote.

The Second International Workshop on RESTful Design (WS-REST 2011) is a forum for discussion and dissemination of research on the the Web's architectural style, Representational State Transfer (REST). It will be held on March 28, 2011 as part of WWW 2011 in Hyderabad, India.

Last year had a high quality set of attendees and submissions, and was a great opportunity to meet those involved in this community.

I'm on the program committee again this year. We are accepting four-page position papers, or eight-page full research papers on your favourite REST topic. Send your papers in by Jan 31st! Even if you can't make it to India, it will be a great forum to disseminate your research!

Time for REST is over

| 4 Comments | No TrackBacks

Just read William's Partial resource update, one more time, which is another kick at the partial resource update can.

The problem is that few web services or RESTful hypermedia describe a data model. Representations themselves don't often describe a data model, they describe a format. This is the same problem as WS-RT, in that they're trying to describe updates to a format (XML), not a data model.

In Atom, this doesn't matter, as it kicks the responsibility to the entry.
In HTML, this doesn't matter, as the machinery of browser & server only needs the format for interoperability.

In systems-to-systems communication, this matters a whole lot, as you're coupling your expectations differently with every media type.

A data model is foundational, it describes the mathematical basis for how data is managed. Examples (for better or worse) include:

  1. Key/Value

  2. Arrays

  3. Relations

  4. Logical Graph

  5. Inverted List

OData describes an implicit data model, based on relations. RDF and OWL describe graph data models with an explicit theoretic foundation that makes them more powerful than Relations.

"Your New JSON Service" probably has a data model, but I bet it's unpublished, and in your head. And even if you did publish it, it probably is different from "this other guy's JSON Service".

What does all of this mean? I think it means that there's two big pieces of work to be done to help evolve RESTful web services to better interoperability:

(a) a data model that covers 80% of common use cases and can be formatted with JSON and XML. This needs to be a closed-world model, like most databases, and thus I don't think RDF qualifies.

(b) a media type that covers 80% of common use cases to describe a resource's lifecycle and state transitions -- in other words, to make POSTs self-descriptive in hypermedia. Because the world of computing isn't just about updating data, it's about abstractions.

It is passe' to perform SQL updates directly from your GUI. It's just as passe' to expect every RESTful hypermedia API to just behave like a database entity. So, we need something in between, something that gives POST meaning to systems. POST is the method which powers most of what we DO (vs. read) on the Web anyway. This "roll your own media type" trend works for simple situations, but really isn't sustainable in the long run, in my opinion.

p.s. yes, that "Part 2" of building a hypermedia agent article is coming ;-)

Put on your Thinking CAP

| 9 Comments | No TrackBacks

In light of some interesting twitter, blog, and IM discussions over yesterday's post (thanks to Chad Walters, Billy Newport, Ryan Rawson, Bill de hOra, Jeff Darcy, and others), I've been pondering three CAP points. Last post on this topic (for a bit), I promise, before I return to your regularly scheduled RESTful programming.

1. Definitions Matter. Or, are CA/CP are the same thing?

Gilbert & Lynch's paper defines the following (I'm not going to use their exact terms, so as to keep this accessible)

  • Consistency (big-C)- aka Atomicity, or a total order on operations (as visible to any user); aka. by database practitioners as serializable isolation. Makes distributed systems look like a single system.
  • Eventual Consistency (little-C)- aka Delayed-t Consistency, or a bounded amount of time before a data object is consistent, also implies a minimum latency between writes on a single piece of data (without specialized conflict resolution)
  • Weak Availability (little-A)- every request received from a non-failing node must result in a response, but this response may take forever to come (i.e. latency is not a part of this definition)
  • Strong Availability (big-A)- every request received from a non-failing node must result in a response even when network partitions occur, but this response may take forever to come (i.e. latency is not a part of this definition)
  • Strong Network Partition Tolerance (big-P)- the network will be allowed to lose arbitrarily many messages sent from one node to another (not counting total network outage). When a network is partitioned, all messages sent across the partition are lost.

I'm going to add the following:

  • Weak Network Partition Tolerance (little-P)- a system that can tolerate some network partitions, but not all combinations of them.

I add this due to my reading of their definition of Big-P and sections 3.2.1 and 3.2.2: "If there are no partitions, it is clearly possible to provide atomic, available data. In fact the centralized algorithm described in Section 3.2.1 meets these requirements. Systems that run on intranets and LANs are an example of these type of algorithms". Section 3.2.1 defines an algorithm for Consistent & Partition Tolerant systems. Hang on to this one, I'll finish my point in a moment.

So, by these definitions, you basically can have (assuming mixed reads & writes):
- AP: eventual consistency, strong availability, strong partition tolerance
- CP: strong consistency, weak availability, strong partition tolerance
- CA: strong consistency, strong availability, weak partition tolerance

Clearly an AP system is different from CP/CA. But the crux is whether CA and CP are the same kind of system. If you consider the idea of weak vs. strong partition tolerance, the difference becomes clear (to me, anyway, but I'm admittedly blonde).

A CA system might be able to handle some partitions, but not all. Simply put, it has a single point of failure, somewhere. The example I like to give is a shared-disk cluster database with a SAN. Partition the SAN, and you're hosed. Partition the nodes, or even the interconnect, and you will still be available, albeit with higher latency. Redundancy is the common way to reduce this risk.

Whereas a CP system is designed to handle all partitions, i.e. it has no single point of failure. But some non-failed nodes may not be able to service clients at all during a partition. This can suck. Anyone who's managed a synchronous log shipping setup or EMC SRDF probably knows about this one.

2. Scope matters

The definition of "node" and "network" can vary depending on your granularity and scope.

The above definitions only make sense if the scope of the system is the server-side of a client/server relationship. (i.e. How can non-failed nodes experiencing a partition still receive and respond to a request? Only if the client isn't the one being partitioned.)

A database that's normally a CA system can be considered CP if you zoom out to notice that it's in a multi-node synchronous replication ring. Or AP if it's in an asynchronous multi-node replication ring. But notice that, probabilistically, it's behaving like a CA system for 99.9% of the time (or however long your MTBF is).

An AP system on the other hand has one big drawback, one that's not spoken about often. It's about the scope of the partition: is it recoverable or not?

An unrecoverable partition in a CP or CA system is "no data loss", even if you can't get at the system. That's not true in an AP system, if there's an unrecoverable error in a certain set of nodes during the "delayed consistency" time window. This occurs during catastrophic media failures, like a bug in the SAN controller corrupting all the disks, trucks driving through a data center, or floods, or bombs, hurricanes, etc.

Even with backups, during this "replication latency", you have to re-enter those transactions, or hope that a CP or CA system somewhere has a copy.

3. Definitions are too easy to get hung up on.

One of the main reasons I'm talking about this is because I see bolded, underlined claims that can be easily misconstrued, like, "you can't choose consistency and availability together in a distributed system". This gives me The Fear.

This point of the above quote, is not to say you can't have some consistency. You just can't have fully serializable consistency and isolation.

But in practice, when you think about it, this claim is actually rather banal. Practitioners have long understood that there are a spectrum of consistency & isolation tradeoffs at scale, even at a single node. Why?

Because the definition of availability arguably needs to also include latency, which is something Daniel Abadi brought up. Serializability means locks, in practice, and that means increased latency, and thus reduced availability. This is why we don't like Two Phased-Commit very much these days.

But, for kicks, go back to the ANSI SQL isolation level debate that Jim Gray stirred up. It was because databases like Oracle (gasp) don't provide serializable consistency, and haven't since the late 1980s! They provide snapshot consistency, which is a different beast where readers don't block writers, and arguably was the primary technical reason for Oracle's success in the market place through the 1990s. Amusingly, this is the same argument that Jim Starkey keeps bringing up when discussing CAP, having invented the idea of multi-version concurrency control, when he's talking about his new baby, Nimbus DB.

So the idea that you can't have fully serializable consistency at scale -- even in a single node database -- is completely uncontroversial. It's when the followup feels like "...and therefore you need to throw out your RDBMS", that cognitive dissonance and tempests in a tweetpot stirs up.

Confused CAP Arguments

| 6 Comments | No TrackBacks

A bit of a Twittergasm lately over the CAP Theorem, with relevant blog posts

I am confused. I've narrowed it down to 7 matters.

Confusion #1: Changing system scope from underneath our feet
The biggest confusion I have is that the scope of "distributed system" under the CAP theorem seems to have changed, silently.

If you look at the original papers:

All of them seem to scope the CAP theorem to the server side of a client/server relationship. That is, it's assumed that the distributed system under consideration is a client/server architecture, but the server-side is distributed in some way. This is a reasonable assumption, since the Web is a client/server architecture variation, and CAP was originally about the tradeoffs in huge scale web sites.

But practically speaking, this is particularly important when considering Partition Tolerance. In the original papers, CA (Consistent, Available) was absolutely common, and in fact, the most common scenario, with Brewer giving examples like cluster databases (on a LAN), single-site databases, LDAP, etc. The most popular commercial cluster databases today arguably retain this definition of CA, such as Oracle's RAC.

But now, AP (availability, partition tolerance) advocates are questioning if CA distributed systems are even possible, or merely just something that insane people do. It's a very strange, extreme view, and I don't think it offers any enlightenment to this debate. It would be good to clarify or at least offer two scopes of CAP, one within a server scope, and another in a system-wide scope.

Confusion #2: What is Partition Intolerance?

This is never well defined. Like lactose intolerance, the results, can be, err, unpredictable. It seems to imply one of: complete system unavailability, data inconsistency, or both. It's also unclear if this is recoverable or not. Generally it seems to depend on the implementation of the specific system.

But because of this confusion, it's easy to let one's mind wander and say that a CA system "acts like" an AP or CP system when experiencing a partition. That leads to more confusion as to whether a CA system really was just a CP system in drag all along.

Confusion #3: Referring to Failures Uniformly

It seems to be a simplifying assumption to equate failures with network partitions, but the challenge is that there are many kinds of failures: node failures, network failures, media failures, and each are handled differently in practice.

Secondly, there are systems with many different kinds of nodes, where they tolerate partitions in one kind of node, but not another kind, or just maybe just not a "full" partition" across two distinct sets of nodes. Surely this is an interesting property, worthy of study?

Taking Oracle RAC or IBM DB2 pureScale, as an example. It actually tolerates partitions of many sorts - cluster interconnect failures, node failures, media failures, etc. What it doesn't tolerate is a total network partition of the SAN. That's a very specific kind of partition. Even given such a partition, the failure is generally predictable in that it still prefers consistency over availability.

Confusion #4: Rejecting Weak vs. Strong Properties

Coda hale sez: "You cannot, however, choose both consistency and availability in a distributed system [that also tolerates partitions]".

Yet this seems very misleading. Brewer's original papers and Gilbert/Lynch talk about Strong vs. Weak consistency and availability. The above statement makes it sound as if there is no such thing.

Furthermore, most distributed systems are layered or modular. One part of that system may be CA-focused, the other may be AP-focused. How does this translate to a global system?

Confusion #5: Conflating Reads & Writes

Consider the Web Architecture. The Web is arguably an AP system for reads, due to caching proxies, but is a CP system for writes.

Is it useful to categorize a system flatly as binary one or the other?

Confusion #6: Whether AP systems can be a generic or the "default" database for people to choose

AP systems to my understanding are application-specific, as weak consistency is difficult to correct without deep application knowledge or business involvement.

The question and debate is, what's the default database for most IT systems... should it be a traditional RDBMS that's still a CA system, like MySQL or Oracle? Or should it be an AP system like a hacked-MySQL setup with Memcached (as AP), or Cassandra?

When I see claims that CA advocates as "not getting it", or "not in their right mind", I have a severe case of cognitive dissonance. Most IT systems don't require the scale of an AP system. There are plenty of cases of rather scalable, highly available CA systems, if one looks at the TPC benchmarks. They're not Google or Amazon scale, and I completely agree that an AP system is necessary under those circumstances. I even agree that consistency needs to be sacrificed, especially as you move to multi-site continuous availability. But it's very debatable that the basic database itself should be CA by default or AP by default.

For example, many IT shops have built their mission-critical applications in a CA manner within a single LAN/rack/data-centre, and an AP manner across racks or data-centres via asynchronous log shipping. Some can't tolerate any data loss and maintain a CP system with synchronous log shipping (using things like WAN accelerators to keep latency down). And not just for RDBMS - this is generally how I've seen Elastic Data Cache systems (Gigaspaces or Coherence or IBM WebSphere ExtremeScale) deployed. They deal with the lack of partition tolerance at a LAN level through redundancy.

It's useful to critique the CA single-site / AP multi-site architecture, since it's so common, but it's still not clear this should be replaced with AP-everywhere as a useful default.

Confusion #7: Why people are being a mixture of dismissive and douchey in their arguments with Stonebraker

Stonebraker's arguments make a lot of sense to IT practitioners (I count myself as one). He speaks their language. Yes, he can be condescending, but he's a rich entrepreneur - they get that way. Yes, he gets distracted and talks about somewhat off-topic matters. Neither of those things implies he's wrong overall. Neither of these things imply that you're going to get your point across more by being dismissive or douchey, however good it feels. (I'm not referring to anyone specifically here, just a general tone of twitter and some of the snark in the blog posts. )

One particular complaint I find rather naive - that he's characterizing CAP theorem (more AP system) proponents as only using CAP theorem to evaluate distributed system availability. Clearly that's not true, but on the other hand, it is kind of true.

Here's the problem: when I hear venture capitalists discussing the CAP theorem over cocktails because of their NoSQL investments, or young newly-graduated devs bugging their IT managers to use Cassandra, you can bet that plenty of people are making decisions based on a twisted version of the new fad, rather than reality. But that's how every technology fad plays out. I think Stonebraker has seen his share of them, so he's coming from that angle, from my view. Or he could just be an out of touch dinosaur like many seem fond of snarking. I personally doubt it.

Part of my continued series of reflections on the WS-REST workshop.

It may continue to amaze, but the most basic aspects of interoperability continue to have a common problem: the implementation almost always violates the protocol specs that it supports. And that violation often gets encoded into practice, and back into the spec. Sometimes this is for good reasons, sometimes not.

Sam drew from his experience with early SOAP interoperability to illustrate the problem. Apache SOAP, for example, a descendent of IBM SOAP4J, and the predecessor to Apache Axis, had a history of supporting the early SOAP spec, when it was a vendor-led W3C note. It had encoded data with the SOAP encoding section of the SOAP specification, because the XML Schema data type system was not due to be out until mid 2001.

Of course, early specifications often under-specify, and that leads to implementation leaking through. A classic borderline cases is representing "Infinity". XML Schema suggests that infinity should be represented as the characters "INF". Apache SOAP, predating this, used the java.util.Number class, whose toString() method generated "Infinity".

The fix is simple enough, but there were deployed systems out there already. The solution was to continue to produce "Infinity", but to accept both "INF" and "Infinity". Future SOAP stacks, like Axis, which were built around XSD in mind, both produced and consumed INF.

My first reaction to a lot of this was, "conform to the (damn) specification". If it were only so simple. Sometimes the spec is wrong or incomplete. Sometimes a boundary condition is missed by implementers. And organizational inertia makes it hard to upgrade. So, documenting what is implemented becomes the preference.

None of the above has really any difference with REST. One of the takeways: contracts matter.

I know that there's trepidation about the term "contract" among those in the REST community, in that it may envision a rigid interface. I think it's a price worth paying -- you cannot eliminate the notion of "contract" when interoperability is a concern. Contracts are not about how rigid or flexible the interface is. They are about ensuring complete, understood, and compliant expectations. And, frustratingly, they cannot be well-specified if there isn't any implementation experience.

One might argue that the Web has proven very good at composing contracts together -- transfer protocol, media types, identifiers, etc. It didn't eliminate the painstaking need for agreement, it depends on it. Those interested in philosophy might find this an interesting paradox, that the explosion of freedom & diversity of web content and applications requires strong control at the lower layers -- an idea Alexander Galloway explores in his book, Protocol: how control exists after decentralization. Practically speaking, the areas of the Web where there was lax specification and/or complex implementation compromises are the areas where there's not a lot of diversity - such as browsers, which have to deal with HTML, CSS, JS, etc. Servers haven't been immune from this, but they have somewhat more understandable specs to follow, with HTTP, MIME, URI, etc., though ambiguity with HTTP has led to a dearth of complete HTTP 1.1 deployments (most use a subset). This has led to the work on HTTPbis to clean up and clarify the spec based on over a decade of experience.

Considering today's new generation of RESTful APIs on social media & cloud platforms, there are some curious trends. JSON-based RESTful services, for example, do not specify dates, or formal decimals. XML (schemas) do have standardized simple types, but the syntax and infoset is off-putting due to its lack of fidelity with popular programming languages. Which is better for interoperability? Maybe the answer is because JSON is so easy to parse, it's ok that each endpoint needs to adapt to a particular set of quirks, like differing date/time formats. But that sounds like a very developer-centric argument, something that serves the lazy side of the brain. We can do better.

Part of my continuing series of reflections on the REST workshop at WWW 2010...

Sam Ruby's keynote began with a call for technologists to back off the extreme positions often taken in technical arguments (likely because he's knee deep in the middle of the mother of all technical standards debates). Using Mark Pilgrim's parable of the web services spec developer fighting to the death over Unicode Normalization Forms, he illustrates that technical advocacy can lead to knee jerk, us vs. them reactions on very minute details, especially when different mindsets collide and have to work out a path to interoperability.

Relating this to the "REST community" (whatever that means), I think we seem to have a built-in defensive posture, and aren't quite used to being in the driver's seat for innovation. There were so many attempts to "fix" the Web into some other architecture: HTTP-ng tried with a distributed object model, the Web Services crowd tried again. The slow counter-revolution of "using the Web" was largely grown out of Mark Baker's health and savings account. And now, REST is now a technical pop culture buzzword, being perverted in meaning as all buzzwords are prone to. Roy's gotta defend the meaning of the word if it's to mean anything. So, in a way, it's completely understandable why RESTafarians honed a defence mechanism against misleading arguments that they fought ad nauseam.

Unfortunately, today, this defence mechanism seems to have limited free discussion of the limitations of REST, the areas of useful extension, the design of media types, and the ability to relate or objectively discuss extensions to REST that may evaluate the benefits of decades-old ideas like transactions, reliable messaging, or content-level security, without being patronizing ("you don't need transactions", or "reliable messaging isn't really reliable", "use SSL, it's easier", etc.). Researchers and standards working groups try to follow the style, or publish extensions, but anticipate the dreaded Roy bitchslap, perhaps because there seems to be a lack of resources (tools, documentation, frameworks, examples in different use cases) to learn how to apply the style, or perhaps because people are being lazy in applying the style (glomming onto a buzzword). Hard to tell.

In short, this is a hard problem to resolve, as there are good reasons for people's strong opinions. Compromise can only be achieved if there's shared understanding of the facts & shared values. Mostly I see shared values in terms of the properties that people find desirable in a networked system (interoperability, extensibility, visibility, etc.), we still seem to be missing a shared understanding of the facts (such as the basic idea of "emergent properties" in software architecture, and the idea that they can be induced through constraints -- which I think many still reject.)

To be fair, there have been plenty of extensions out there, include the ARRESTED thesis from Rohit Khare, various public RESTful APIs and frameworks out there, Henrik Nielsen's use of RESTful SOAP services in Microsoft Robotics studio, etc.

Look at the reference to the latter, for example. There are plenty of "specific methods" being defined while fitting under the POST semantics. Some would have a problem with that, but I'm not sure what the alternative would be. There are "state replacement" methods under the PUT semantics if you prefer that. But some clients may prefer the more specific, precision state transfer of a POST method. Would you consider this heresy? Or would you stop and realize that Henrik was in the front row of the WS-REST workshop, and co-authored too many W3C notes and IETF RFC's to enumerate here?

I'll speak more about this problem with the "write side of the web" soon, but in short, we don't really have a good way of describing the pre-conditions, post-conditions and side-effects of a POST in a media type, which would be probably the more "RESTful" thing to do. Mark Baker took a stab at this 8 years ago, but there's hasn't been much pickup. In lieu of that, the simplest thing might be to make an engineering decision, slap a method name & documentation on the URI+POST interaction, and move on. You'd be no worse than a WSDL-based Web Service, with the added benefit of GET and URIs. Or, depending on your use case, it may mean crafting a full media type for your system, though I'm not sure that's "simple".

This dilemma is why I tend not to have a problem with description languages like WSDL 2.0 or WADL -- they're not complete, certainly, but they're a start to wrapping our heads around how to describe actions on the Web without breaking uniformity.

Most of those I spoke to at WS-REST felt was that media types, tools, and frameworks for REST have been growing slowly and steadily since 2008, but lately there's been a major uptick in activity: HTML5 is giving browsers and web applications a makeover, OAuth is an RFC, several Atom extensions have grown, Facebook has adopted RDFa, Jersey is looking at Hypermedia support, etc.

But, coming out of the WS-* vs. REST debates, it is interesting in noting what hasn't happened the past two years, since the W3C Workshop on the Web of Services, which was a call to bridge-building. As expected, SOAP web services haven't really grown on the public-facing Internet. And, also as expected, RESTful services have proliferated on the Internet, though with varying degrees of quality. What hasn't happened is any real investment in REST on the part of traditional middleware vendors in their servers and toolsets. Enterprises continue to mostly build and consume WSDL-based services, with REST being the exception. There has been, at best, "checkbox" level features for REST in toolkits ("Oh, ok, we'll enable PUT and DELETE methods"). Most enterprise developers still, wrongly, view REST as "a way of doing CRUD operations over HTTP". The desire to apply REST to traditional integration and enterprise use- cases has remained somewhat elusive and limited in scope.

Why was this?

Some might say that REST is a fraud and can't be applied to enterprise integration use cases. I've seen the counter-evidence, and obviously disagree with that assessment, though I think that REST-in-practice (tools, documentation, frameworks) could be improved on quite a lot for the mindset and skills of enterprise developers & architects. In particular, the data integration side of the Web seems confused, with the common practice being "make it really easy to perform the required neurosurgery" with JSON. I still hold out hope for reducing the total amount of point-to-point data mapping & transformation through SemWeb technologies like OWL or SKOS.

Perhaps it was the bad taste left from SOA, or a consequence of the recession. Independently of the REST vs. WS debate, SOA was oversold and under-performed in the market place. Money was made, certainly, but no where near as much as BEA, IBM, TIBCO, or Oracle wanted, and certainly not enough for most Web Services-era middleware startups to thrive independently. It's kind of hard to spend money chasing a "new technology" after spending so much money on the SOAP stack (and seeing many of the advanced specs remain largely unimplemented and/or unused, similar to CORBA services in the 90's).

Or, maybe it was just the consequence of the REST community being a bunch of disjointed mailing list gadflys, standards committee wonks, and bloggers, who all decided to get back to "building stuff" after the AtomPub release, and haven't really had the time or inclination to build a "stack", as vendors are prone to.

Regardless the reason, the recent uptick in activity hasn't come from the enterprise or the enterprise software vendors offering a new stack. The nascent industries that have invested in REST are social computing, exemplified by Twitter, Facebook, etc. and cloud computing, with vCloud, OCCI and the Amazon S3 or EC2 APIs leading the way.

The result has been a number of uneven media types, use of generic formats (application/xml or application/json), mixed success in actually building hypermedia instead of "JSON CRUD over HTTP", and a proliferation of toolkits in JavaScript, Java, Ruby, Python, etc.

We're going to be living with the results of this first generation of HTTP Data APIs for quite some time. It's time to apply our lessons learned into a new generation of tools and libraries.

About this Archive

This page is an archive of recent entries in the Tech category.

Society is the previous category.

Find recent content on the main index or look in the archives to find all content.

About Me
(C) 2003-2011 Stuart Charlton

Blogroll on Bloglines

Disclaimer: All opinions expressed in this blog are my own, and are not necessarily shared by my employer or any other organization I am affiliated with.