In our last episode, a minor bun fight ensued about the way to think about convenience vs. correctness in distributed interactions.
I made a number of bold claims, such as "the vast majority of RPC systems, from RMI, to CORBA, have --all-- been focused on mapping the design of a system into programming language constructs that completely gloss over most of the issues inherent in distribution".
In the comments section, Dan Creswell asked, "Can you provide some example issues that RMI and CORBA gloss over please?" I guess I'm asking are we talking latency, failure, something else?
Similarly, Steve Jones asked, "Seriously I just don't know the people who worked like that. I was working on distributed systems throughout the 90s and in NONE of them did we consider local = remote."
To which I say, "yes latency, yes failure, but mostly semantics.".
To use latency as an example, I have worked on at least two or three financial trading systems that involved 50 to 500 remote procedure calls to perform a transaction, wreaking havoc on both performance and recoverability. I know that anyone who works in the wireless telecom industry is likely familiar with a certain telecom billing systems' Java wrapper API, which was designed completely around performing approximately 2-3 network round trips for every field you needed to set in a transaction (which was a stateful session bean that talked to entity beans), leading to between 30 and 150 RPC's for every transaction. People wonder why so many business transactions failed, and why so much infrastructure was required to support a small volume of business transactions.
The problem is that, as I believe Steve Jones has pointed out before, there are too many projects without any real experts available on them, so the developers building the systems (which involve billions of dollars of transactions) aren't aware of any innate issues of distribution, so build the network interactions with the approach they know -- local object-oriented programming. This is also why vendors continue to build tooling that doesn't deviate from local object-oriented programming, as it's risky to try to shift the mindset of developers to something different.
The semantics area is one that's been discussed many times with regards to REST's uniform interface, so I won't repeat it here, though we could dig into it more if some would like to.
Then, Steve Jones noted the following:
This is my issue. At the moment people are chasing a perceived technical perfection over a practical set of improvements. RPC works sometimes, REST works sometimes, Messaging works sometimes, flying monkeys only work on certain edge cases. The real focus should be how to think about the overall business problem, the continual belief that the next technology will make IT more successful is a big part of the problem.
I have run into this opinion before from Steve several times in the past. It almost is, to me, a form of technology fatalism. Many IT executives that I know and respect share this opinion. So, I respectfully disagree, and will try to explain why.
I agree that technology alone will not make IT successful.
I do not deny that there are variable levels of skill and productivity among IT workers.
I reject the notion that ensuring that all IT workers understand the business better, though highly desirable, is a realistic solution to the problem of IT failure.
I also reject the notion that we cannot raise the bar of productivity across the board through better technology, for a variety of reasons that I will explain next.
Firstly, there will always be significant numbers of IT workers that do not, and cannot (!) understand business problems to a greater degree than technological problems. This is for a variety of factors (educational, for one; engineers aren't usually schooled in those liberal arts that provides insights into humanity and society). This problem goes both ways: there will always be significant numbers of business people that are clueless with regards to technology, for better or worse (as one's has only so much time in the day and thus must choose to be ignorant of some topics). It may be possible to fix this problem, but it is a problem for entire generations of workers, not something fixable with anything less than a 30 year time window.
So yes, if you don't understand the business, you're dead. But, there's a lingering problem: if your business problem relies on technology, and you're either not aware of the appropriate technology to solve your problem, or you screw the technology up, you're likely still dead.
In short, contrary to the popular saying, it's not just about the business. But it's also not just about the technology. It's about both. They reflexively impact and depend upon one another.
History seems to show this relationship. The development of the steam engine, the telephone, the railroad, the automobile, the television, the computer, the web, etc. have all shaped society in ways it did not expect. Few of these were driven by 'understanding the business better'. This is not to say that all IT projects are on the level of these innovations. It is, however, an recognition that major innovations come with major changes, some of which are hard to interpret for those stuck in the prior model. (e.g. Television impacted U.S. presidential politics as early as 1960, with the Nixon vs. JFK debate).
The moral of this story is that the technology-influenced mindset and architectures we take to our business initiatives will have a direct impact on how the business performs. There's an old saying in software product development: "TEAM = PRODUCT", also known as Conway's Law. Technology is often a necessary, even if not sufficient, enabler for those other, more important business changes. And much of that enablement is emergent behaviour, i.e. not directly traceable to the functional implementation of a business requirement. Rigid, brittle technology, mixed with a dose of fear, can often provide enough inertia to kill the best laid hopes of broader change.
So, tying back to our discussion of network interactions and RPC's. The problems I have with suggestions such as "an expert can make RPC (or EJB, or WS-*, or.. *) work well in practice" are:
a) they're generally correct in their outlook, for a particular audience;
b) they sometimes use the Internet's architecture and principles as an example of the right way to do things;
c) but they expect every IT department to build their own version of the Internet's architecture with the toolboxes that vendors give them.
Yet the vendors themselves are stuck in the approaches and techniques of yesteryear - old wine in new bottles, while they have nearly always worked to actively dissuade adoption of alternatives to their approach, or even destroy those who support new alternatives. Technology debates and standards bodies become proxy wars for market competition, and progress in the technology debates and standards is often more about fixing the political situation, not the actual technology itself. How is an IT department supposed to know any better if industry leaders aren't raising the bar for them?
The problem with approaches like "yet another RPC stack" are that they are not catalyzing changes in technology architecture much beyond where they were in 1984. Similarly, the WS-* world has not been catalyzing changes in technology architecture much beyond where it was in 1998. It's still, basically, component software, just now focused on protocols of runtimes. Much of what we have is improved tooling for Visual COBOL. And there are even signs that the vendors are moving back to emphasizing runtimes.
In summary, I think improving technology is necessary along with (not instead of) an increased understanding of business problems, because I think they help each other out. IT's business value basically comes down to providing tools and catalytic ideas to change the transaction cost structure of a firm. You can't do it without the business, but you often won't have the germ of the thought without innovative technology and associated architectures.