August 31, 2003

sometimes it's better to keep quiet

Instead of posting my thoughts, sometimes I think it's better for me to shut up and watch the fireworks. I find after I post any longish post or rant, I change my mind after I hit "send". Sigh.

Ok, so after my longish post to TSS about web services, I read Werner Vogel's excellent Web Services are NOT Distributed Objects. This is indeed the first time I've seen someone that's been a part of the dist-obj community in the past actually take this position, so I'm quite interested in what he has to say. I'm wondering if this will change my mind about what I said just a few hours ago...

At first I thought "great, just when I thought I had it figured out, my paradigm's going to have to shift". After reading the article though, I think it resonates with me that most of what he says is stuff I already believed. But it did change my mind about some things, in a sense. It reminded me about stuff that I've kind of forgotton that I knew a couple of years ago, but mired in the practicalities of day-to-day technology, I had forgotton.

I do believe web services are different from distributed objects. I've detailed in other places what I felt web services to be, and that page hasn't changed much in the 2+ years since I wrote it, so I think it's still relevant. My view of the differences are primarily a) intrinsic message structure vs. dependence on an extrinsic definition language, b) thus the web services "document" model is a lot more like a dynamically interpreted interface (DII / IDispatch / RMI+Reflection) than a traditional static dist-obj interface in the CORBA/COM world c) and it's not based on any object model at all, so we don't have to worry about platform religious wars.

Werner takes the perspective of the old "stateful vs. stateless" debate. Web services are stateless at their base level he says, whereas distributed objects (COM notwithstanding) are stateful. Granted. I have, however, been somewhat biased by practice here: stateless EJB session beans tend to be the norm as entry points into today's distributed object systems.

As for "document oriented computing", while I've believed this is important, I've seen a lot of mixed messages here from various parties, so I've been kind of confused about what it means. It seems to me that every vendor is hell-bent on retrofitting RPC on top of web services as if that was the end goal. I've seen a few fairly influential articles from old COM folks that were very eager to make Web Services as static as COM/CORBA. On the other hand, Microsoft .NET has almost totally embraced the doc+literal model, which is good. Perhaps I just stayed out of the web services front-line discussions for too long and there was a switch at some point among these types, but I don't think so -- I'd also like to note that the other major platform vendor out there dropped their messaging-oriented web services API (JAXM) to focus on JAX-RPC. That's a very sad thing, and I wonder if it will hurt Sun in the long run. But on the other hand, I do admit, programming for JAXM is more painful than JAX-RPC given current tools, but perhaps JMS will subsume what JAXM was.

One thing I hear a lot, and Werner's essay echos this, is that "document exchange" is very different from "object interfaces". At the core, I suppose this is true. But, as I go around training people in J2EE, EJB, .NET, and Web Services, I try to find a way to relate the approaches. How is "document exchange" really different from traditional message passing, specifically the self-describing messages of some products like TIBCO RV?

The way I've taught it is as such: take your object interface, and reduce it to one method: void execute(Object) or Object execute(Object). (Or, alternatively, look at it like you would a JMS MessageListener: void onMessage(Message). Or like any other MOM-based system where you get a callback). The object coming in and going out can be an object graph (bound to XML through some mapping), or a DOM tree, or a pipeline of SAX events - the idea is to take the "surface area" of your interface and place it into your data. With only one physical entrypoint, the data itself can map to further logical actions and/or events - multiple schema instances in the same message. All of this is fairly fuzzy, but it's the general direction I've been thinking. I've always seen WSDL as a "crutch" technology - it's a way of hacking up a document into procedures & arguments, but it doesn't necessarily imply "RPC". But perhaps I'm wrong on this, I know there are many that view WSDL as essential (perhaps they're just subscribing to the RPC uber alles school).

I do take Werner's points on lifecycle to heart. That's one area I haven't paid a lot of attention to. What use is a create() method on a stateless session bean? There is none!

And then there's the actual document's data representation, which is something nobody really talks about that much. Is XML's data model really appropriate for most uses? Where does it break down, or become too complicated? Isn't it just hierarchies all over again - aren't relations still "the thing"? I remember sitting in a BearingPoint/KPMG briefing on EAI a few months ago and listening to one of their chief architects wax poetic about the cognitive studies IBM did in the late 1960's about the "folding" of data being a natural way that humans perceive and deal with information, and it's "interesting" how XML relates to that, that we may be going full circle back into a more hierarchical or network view of data. I wonder. I've become a bit of a relational/Oracle nerd in the past year thanks in part on one hand to a well-known troublemaker and on the other hand, someone who is probably the most inspirational technologist I've come across (in terms of the ability for one man to master his speciality), Tom Kyte.

A lot of this distributed computing stuff ignores that you could do a lot of commercial processing pretty cheap and fast on a relational database by just slapping the XML API's on top. Oracle's done a nice job with this, I've seen examples where it takes only a few of lines of code to expose a fairly large Oracle system to RSS or SOAP. From the theoretical side, Pascal and Date have been having troubles with this XML hype wave because it further drives the need for a complete implementation of the relational model out of the mainstream.
Werner mentions versioning as a key problem. This is something I've really found a dearth of discussion on. Some refer to namespaces as a versioning mechanism, but I've heard others vehemently oppose such an idea. A session at JavaOne 2003 proposed a mechanism that leveraged UDDI, which struck me as similar to how we do it today with MOM, distributed objects and LDAP.

Versioning should be fertile ground for innovation, XML effectively eliminates the need for "positional" semantics in messages which was problem with static dist-obj RPCs - adding extra elements or tacking other schemas onto a document should be a lot easier in the XML world. A couple of problems (being naive here): the pervasive use of the xsl:sequence ensures there is ordering to elements (does there really need to be?) .. furthermore, do people really design their schemas to be flexible (i.e. allow any namespace attribute or element to be tacked on in spots?) I see some evidence of this, but I doubt the discipline will be there in the mainstream.

Anyway, this has gone on for too long, but getting back to my original reason for writing this: in the end, I don't really think I've changed my mind too much about what I said about Don's stuff on SOA. Though now that I read his blog I realize that he was quoted out of context (silly me, should have known). I think web services are different than distributed objects, but SOA really is just distributed component based development in new clothes with some more flexible underpinnings. Perhaps I'll change my mind tomorrow. :-)

Posted by stu at August 31, 2003 08:30 AM