Tech: February 2007 Archives

A cautionary tale for those who believe they have the grand interoperability mojo.

There once was a big customer, an XML appliance and a web services stack. The XML appliance implemented HTTP 1.1, aka RFC 2616 and MIME, aka RFC 2046 as most expect one to. They also implemented SOAP with Attachments and the WS-I Attachments Profile, as all good enterprisey people should.

But there was something odd about how many carriage return + line feeds (CRLF) to include between the last HTTP header and the start of a multipart/* entity body. The appliance sent three CRLFs, and required three, and rejected that which did not have three. The web services stack sent two and expected two, but tolerated more. Sending multipart messages to the appliance broke.

Big customer complained. XML appliance didn't budge. Web services stack, like all good software groups, believed they were in error, and fixed the issue. All was well for over a year...

...Until application server A came onto the scene. It, strangely, exhibited the same problem as the web services stack did many months prior. Big customer complained: You are out of compliance! We require compliance! You are trying to lock us in! The big customer & application server vendor both beat their heads together, thinking that perhaps the RFCs were inconsistent, or ambiguous. Eventually big customer figured that compliance is irrelevant (though, naturally, after the non-compliance tongue-lashing), interoperability is more important, whatever the fix.

In the end, of course, the RFCs, one of which dates back to 1996, are not inconsistent. It's that implementers sometimes don't read carefully.

The misleading part is in RFC 2046, Section 5.1.1:

"NOTE: The CRLF preceding the boundary delimiter line is conceptually attached to the boundary so that it is possible to have a part that does not end with a CRLF (line break). "
But when one reads the BNF, we notice this isn't always true:
     dash-boundary := "--" boundary
                      ; boundary taken from the value of
                      ; boundary parameter of the
                      ; Content-Type field.

     multipart-body := [preamble CRLF]
                       dash-boundary transport-padding CRLF
                       body-part *encapsulation
                       close-delimiter transport-padding
                       [CRLF epilogue]

     encapsulation := delimiter transport-padding
                      CRLF body-part

     delimiter := CRLF dash-boundary
Wherein we see that the first MIME multipart dash-boundary doesn't include a CRLF. That CRLF is rolled into the preamble as optional. Unfortunately, it doesn't help matters when the WS-I Attachments Profile, Section 3.12, R2936 says:
"Certain implementations have been shown to produce messages in which the MIME encapsulation boundary string is not preceded with a CRLF (carriage-return line-feed). This creates problems for implementations which correctly expect that the encapsulation boundary string is preceded by a CRLF.... RFC2046 section 5.5.1 clearly requires that all encapsulation boundaries must be preceded with a CRLF (carriage-return line-feed)."
Yikes. I've sent an email feedback to the WS-I organization indicating that this seems to be a misstatement.

Informal testing (ymmv) indicates spotty compliance of how many CRLFs are between the last HTTP header and the first MIME boundary:

  • Application Server A inserts 2 CR/LF, expects at least 2
  • Application Server B inserts 3 CR/LFs, expects at least 2
  • Application Server C inserts 2 CR/LFs, expects at least 2
  • Web Services Library A inserts 3 CR/LFs, expects at least 2
  • Web Services Library B inserts 2 CR/LFs, expects at least 2
  • XML Appliance inserts 3 CR/LFs, requires at least 3
The morals of this story:
  1. do not just trust specification text -- read the formal grammar
  2. "compliance" doesn't necessarily mean interoperability
  3. software seems more forgiving than hardware

JetBlue

|

Guess what infrastructure JetBlue runs? I recall reading about their Microsoft-only environment back in 2001, and thinking "this could eventually bite them hard". Not that it's the reason for the operations meltdown -- that's not public info. I also believe that Microsoft's infrastructure can scale quite well.

The trouble is, in my experience, there's a false belief in some IT managers that Microsoft's software infrastructure is somehow a magical elixir to keep infrastructure costs low. That's tripe. There is no panacea in picking one vendor over another in terms of keeping infrastructure costs down in the face of increasing demand.

Maybe when JetBlue built its infrastructure out, Microsoft's approach really was the best way to keep costs low from a combination of developer productivity, hardware + software costs, maintenance & support costs, training costs, etc. But apparently they didn't track their scalability assumptions to deal with problem scenarios, like the recent Valentine's Day storms.

Broad, sweeping generalization time: there are two types of managers - those that want to sign a check and not think about their problem, and those that want to think their way through a problem. The latter is politically riskier, but the former is much riskier in reality. It's not that Microsoft's stuff can't scale, it's that management doesn't invest in it relative to increasing demand, because they signed a check and "it's supposed to work" like all elixirs should! The same could be said for large IT outsourcing or offshoring deals, with questionable results. (I could have an entire post about management-by-spreadsheet now, but I'll stop...)

The question is about where the "straight and narrow path" of your chosen infrastructure hits the scalability wall. At some point, building an infrastructure on a shoestring (and without systems architects that have a performance specialist background) is going to break your (and your vendor's) default scalability assumptions.

You need to actually know *what* scalability your hardware and software combination is capable of and not just blindly follow the trodden path of PHP docs, MSDN, IBM developerWorks, or BEA's eDocs. As Neil Gunther would say, your team needs to know and agree on what part of the scalability elephant they're feeling.

Pages

Powered by Movable Type 4.1

About this Archive

This page is a archive of entries in the Tech category from February 2007.

Tech: January 2007 is the previous archive.

Tech: March 2007 is the next archive.

Find recent content on the main index or look in the archives to find all content.

About Me
(C) 2003-2008 Stuart Charlton

Blogroll on Bloglines

Disclaimer: All opinions expressed in this blog are my own, and are not necessarily shared by my employer or any other organization I am affiliated with.