February 01, 2005

Building the new database, pt 1

Sometimes I just don't have time to keep up with the pace of conversation in the blogosphere. Perhaps because most of the members are pro-am pundits or journalists they can pull it off :-)

Anyway, related to this, I have a few thoughts brewing on that database debate that Adam Bosworth kicked off a few weeks ago, about how database vendors are providing less of what customers want, and open source could fill the gap. I also caught the radio show where Bosworth & co suggest there should be an easier way to do it than how we do it today.

Here's the nutshell, speaking as an Oracle DBA and one-time object database nerd. It really is hard. It will be easier, but the baseline of knowledge on how databases actually work is so _low_ out there, it's going to take a while. And in terms of specific features - dynamic partitioning and modern indexing, vendors like Oracle *are* providing these things, and they're not tremendoulsy hard to use, it's just that people don't bloody spend the time to learn them.

There's a cultural problem in the database community at work here -- there is too much emphasis on "operations" and not enough on "development" and "play". AskTom.oracle.com is probably the best example of a "DBA playground" , in terms of the attitude of information sharing and trying out ideas -- and is quite inspiring as to what one can do, very productively, with modern databases.

There's also confusion in basic assumptions of how one achieves scalability and reliability. If one's interested in this space, read (or re-read) In Search of Clusters for a feel of how this idea has evolved. There are many biases and perceptual challenges here. For example, Adam's use of the word "partitioning" already hints of a bias towards a particular style of parallelism (shared nothing), something that may be more applicable to Google's case than Federal Express' case. Few cluster architectures are "general purpose" to fit all cases (though Oracle argues that shared-disk and RAC are general purpose 'enough').

As for things like "dynamic schema", I am curious. Object databases like Gemstone provided this 10 years ago, and some companies , particularly Utilities and Container shipping companies, use schema evolution to great effect in their billling, routing, or trouble ticketing systems. But it wasn't enough for OODB's to catch on. Today, it's not a completely solved problem, but it's something that , for example, Oracle is working hard on. Every release they add new maintenance features that allow schema evolution without downtime -- first index rebuilding, then partition swapping, and now complete online table re-organization -- only with a quick table lock at the beginning and end of the operation. There's a whole discussion here about where should abstraction begin & end that I could get into (particularly about people that insist on building an abstract layer on top of their relational databases, which are already, guess what, an abstract layer on top of a filesystem).

Adam suggests that if these features do exist , vendors aren't explaining them or pushing them well enough. That may be true, but there's a deeper cause, I think. Generally I *do not* see these kinds of requests from most customers. They're having a hard enough time with 'static' requirements and techniques. Dynamic ones are too scary. Only the sophisticated customers, driven by deeply technical people, ask for these kinds of features. (These are the people one dreams of working for :)

Does Oracle listen to these people? Absolutely. The engineers know this stuff matters. But can they sell it in a marketing deck? It's a different audience. Perhaps that's why we don't hear about this stuff.

I'll expand on this in future.

Posted by stu at February 1, 2005 11:03 AM