A bit of a Twittergasm lately over the CAP Theorem, with relevant blog posts
- You Can't Sacrifice Partition Tolerance by coda hale
- Problems with CAP by Daniel Abadi
- Clarifications on the CAP Theorem by Mike Stonebraker
- Reactions to Coda's CAP Post by Jeff Darcy
- Someone is Wrong on the Internet by Jeff Darcy
I am confused. I've narrowed it down to 7 matters.
Confusion #1: Changing system scope from underneath our feet
The biggest confusion I have is that the scope of "distributed system" under the CAP theorem seems to have changed, silently.
If you look at the original papers:
- Lessons from Giant-Scale Services by Brewer
- Harvest, Yield, and Scalable Tolerant Services by Fox & Brewer
- Toward Robust Distributed Systems presentation by Brewer
- Brewer's Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Services by Gilbert & Lynch
All of them seem to scope the CAP theorem to the server side of a client/server relationship. That is, it's assumed that the distributed system under consideration is a client/server architecture, but the server-side is distributed in some way. This is a reasonable assumption, since the Web is a client/server architecture variation, and CAP was originally about the tradeoffs in huge scale web sites.
But practically speaking, this is particularly important when considering Partition Tolerance. In the original papers, CA (Consistent, Available) was absolutely common, and in fact, the most common scenario, with Brewer giving examples like cluster databases (on a LAN), single-site databases, LDAP, etc. The most popular commercial cluster databases today arguably retain this definition of CA, such as Oracle's RAC.
But now, AP (availability, partition tolerance) advocates are questioning if CA distributed systems are even possible, or merely just something that insane people do. It's a very strange, extreme view, and I don't think it offers any enlightenment to this debate. It would be good to clarify or at least offer two scopes of CAP, one within a server scope, and another in a system-wide scope.
Confusion #2: What is Partition Intolerance?
This is never well defined. Like lactose intolerance, the results, can be, err, unpredictable. It seems to imply one of: complete system unavailability, data inconsistency, or both. It's also unclear if this is recoverable or not. Generally it seems to depend on the implementation of the specific system.
But because of this confusion, it's easy to let one's mind wander and say that a CA system "acts like" an AP or CP system when experiencing a partition. That leads to more confusion as to whether a CA system really was just a CP system in drag all along.
Confusion #3: Referring to Failures Uniformly
It seems to be a simplifying assumption to equate failures with network partitions, but the challenge is that there are many kinds of failures: node failures, network failures, media failures, and each are handled differently in practice.
Secondly, there are systems with many different kinds of nodes, where they tolerate partitions in one kind of node, but not another kind, or just maybe just not a "full" partition" across two distinct sets of nodes. Surely this is an interesting property, worthy of study?
Taking Oracle RAC or IBM DB2 pureScale, as an example. It actually tolerates partitions of many sorts - cluster interconnect failures, node failures, media failures, etc. What it doesn't tolerate is a total network partition of the SAN. That's a very specific kind of partition. Even given such a partition, the failure is generally predictable in that it still prefers consistency over availability.
Confusion #4: Rejecting Weak vs. Strong Properties
Coda hale sez: "You cannot, however, choose both consistency and availability in a distributed system [that also tolerates partitions]".
Yet this seems very misleading. Brewer's original papers and Gilbert/Lynch talk about Strong vs. Weak consistency and availability. The above statement makes it sound as if there is no such thing.
Furthermore, most distributed systems are layered or modular. One part of that system may be CA-focused, the other may be AP-focused. How does this translate to a global system?
Confusion #5: Conflating Reads & Writes
Consider the Web Architecture. The Web is arguably an AP system for reads, due to caching proxies, but is a CP system for writes.
Is it useful to categorize a system flatly as binary one or the other?
Confusion #6: Whether AP systems can be a generic or the "default" database for people to choose
AP systems to my understanding are application-specific, as weak consistency is difficult to correct without deep application knowledge or business involvement.
The question and debate is, what's the default database for most IT systems... should it be a traditional RDBMS that's still a CA system, like MySQL or Oracle? Or should it be an AP system like a hacked-MySQL setup with Memcached (as AP), or Cassandra?
When I see claims that CA advocates as "not getting it", or "not in their right mind", I have a severe case of cognitive dissonance. Most IT systems don't require the scale of an AP system. There are plenty of cases of rather scalable, highly available CA systems, if one looks at the TPC benchmarks. They're not Google or Amazon scale, and I completely agree that an AP system is necessary under those circumstances. I even agree that consistency needs to be sacrificed, especially as you move to multi-site continuous availability. But it's very debatable that the basic database itself should be CA by default or AP by default.
For example, many IT shops have built their mission-critical applications in a CA manner within a single LAN/rack/data-centre, and an AP manner across racks or data-centres via asynchronous log shipping. Some can't tolerate any data loss and maintain a CP system with synchronous log shipping (using things like WAN accelerators to keep latency down). And not just for RDBMS - this is generally how I've seen Elastic Data Cache systems (Gigaspaces or Coherence or IBM WebSphere ExtremeScale) deployed. They deal with the lack of partition tolerance at a LAN level through redundancy.
It's useful to critique the CA single-site / AP multi-site architecture, since it's so common, but it's still not clear this should be replaced with AP-everywhere as a useful default.
Confusion #7: Why people are being a mixture of dismissive and douchey in their arguments with Stonebraker
Stonebraker's arguments make a lot of sense to IT practitioners (I count myself as one). He speaks their language. Yes, he can be condescending, but he's a rich entrepreneur - they get that way. Yes, he gets distracted and talks about somewhat off-topic matters. Neither of those things implies he's wrong overall. Neither of these things imply that you're going to get your point across more by being dismissive or douchey, however good it feels. (I'm not referring to anyone specifically here, just a general tone of twitter and some of the snark in the blog posts. )
One particular complaint I find rather naive - that he's characterizing CAP theorem (more AP system) proponents as only using CAP theorem to evaluate distributed system availability. Clearly that's not true, but on the other hand, it is kind of true.
Here's the problem: when I hear venture capitalists discussing the CAP theorem over cocktails because of their NoSQL investments, or young newly-graduated devs bugging their IT managers to use Cassandra, you can bet that plenty of people are making decisions based on a twisted version of the new fad, rather than reality. But that's how every technology fad plays out. I think Stonebraker has seen his share of them, so he's coming from that angle, from my view. Or he could just be an out of touch dinosaur like many seem fond of snarking. I personally doubt it.

Problems with CAP link seems broken.
Fixed, thank you.
I mostly agree with what you say here. As I'm sure you've noticed, I'm among those who believe that CA systems are possible, but that they have highly undesirable behavior when partitions do occur. The key is that they do. I used to work on HACMP, specifically on the network monitoring and routing pieces, and in even the most carefully designed local networks there would still be glitches that resulted in actual or effective partitions. With sufficient redundancy these could be reduced to a low enough frequency that the cost of such failures was less than the cost of implementing a different kind of system. IBM wouldn't have sold the tens of thousands of HACMP licenses that they did if CA weren't real and useful.
Nonetheless, the possibility of partition does still exist and tradeoffs that have changed in the fifteen years since then leave a lot more people in the "AP world" than the database die-hards seem to think. It's not just Facebook and Google. As I pointed out in one of my posts, there are dozens of applications *within* Facebook with a million-plus monthly users, which surely puts them at a scale where the tradeoffs change. Those are matched by many more free-standing web applications, which in turn are matched by many more private ones. This might not be the majority model yet, but it is most certainly a common one.
All that said, I think the model of CA within a data center and AP between is not a bad one. In a sense, you're treating a site as a single node with a very low probability of failure and internal latency, but with a much higher latency and probability of disconnection from other nodes.
Thanks for your response.
Amusing side story: I'm working with a customer whose major system uses HACMP (and a couple other technologies, leading to a "not very HA" result). They've had major outages lately -- all due to a mix of capacity problems (SMP systems ftw), bad code, and human error. But they're moving over time to a multi-site continuous availability layout. The architecture is studied by people that don't know much about the CAP debates and the decision on consistency relaxation goes all the way up to the CIO and CFO. Who, given the politics, are much more willing to throw money at the problem (WAN accelerators, synchronous log shipping, the whole bit) than to accept consistency problems, even if repairable. At some point they'll have to relent, the question is... what is that point?
Another anecdote: Amazon's most common, standard database internally used is.... Oracle. It's littered all over the place, at least as of 2008, to my knowledge. Shouldn't be a surprise as Brewer said the same thing back in PODC 2000, they use CA systems to hold the money. The common case is an AP "macro" distributed system for the website, with modules & layers of CA systems (Oracle) and AP systems (Dynamo-ish), and probably CP systems here and there (e.g. the clustered file systems used for Oracle RAC, etc.).
HACMP + Oracle is absolutely CP in drag when looking across the two databases, because of the quorum enforcement. Even today, PowerHA is what IBM promotes in a virtualized environment because the PowerVM live migration doesn't work in failure conditions.Oracle RAC and other shared-disk cluster databases like DB2 pureScale or ScaleDB on the other hand, are different IMO. In the case of RAC, the cluster interconnect (say it with me in a breezy female voice... "Cache Fuuuuusion") is really just an optimization -- Oracle 8i parallel server didn't need it and just used the SAN interconnect to "ping" other nodes to let go of block ownership. Of course it was slow as molasses for hot block writes, but this is to Daniel Abadi's point that we should include latency in our pedagogical acronyms. I haven't dug in practice as to how much latency this sort of failure condition would add to today's RAC write operations, I'm sure the answer is "a lot"...