What is a database model, you may ask. Can we even define it?


Yes we can. It’s literally what it says it is.

Phew, at least this was easy. Definitions always make me nervous.

What does it take to be added to this prestigious list, I wonder?

Wikipedia's list of database models with a crude arrow pointing at the end of the list and text saying "your model here"
The intended effect of this post

What are database models made out of?

Necessity, mostly.

Starting from ye olden days, the hierarchical model was brought into this world because something was needed that would work for laying out data on disk. It worked well enough, I presume, especially since it’s still around. But it didn’t really stick, because after a while, having to worry about the physical layout of the data was causing some headache to application developers. And when developers get headaches, stuff happens.

The Relational model was created when the High Energy Table Acceleration lab at IBM had a tuple radiation leak which hit the Disk Armature Testing Facility and E.F.Codd was caught in the blast1. The algebra that came out of this fortuitous accident was nothing short of genius, in this blogger’s opinion. It simultaneously created an abstract data model while being optimized for the hardware of the era. And, being algebraic and all, it allowed for optimizations upon optimizations, so performance could increase without the application developer having to ever worry about on disk data placement and all that.

Genius I tell you.

For a while after that, all was good with the world. When you have a hammer, everything looks like a nail, but when the hammer is amazingly good, well, then, maybe its ok to keep hitting things with it until they work.

ACID, BASE, who’s keeping score anyway

The problem with the relational model was, of course, that it was built for the hardware of the 80’s, using the compiler tech of the time. The astute reader will note that we are no longer in the 80s and we should rightfully ask if we can do better.

And maybe, just maybe, since we have all these new languages with Objects and Threads and everything, maybe we can learn something from those. Nobody wants to build another ORM.

And if we could get away from SQL with its insistence for this ACID thing which makes things slow, that would be a nice bonus too, I guess. Correctness is overvalued anyway, not to mention hard to understand.

Did I say that it’s slow?

Not having to fight Oracle and IBM on their home turf, that’s probably a good idea too.

Just like that, we got documents, key and values, triplets, XML and graphs - every data structure ever created, it was made into a database model.

But did they work?

They did something better than work. They Scaled(TM).

And sometimes they worked.

It’s all about the APIs

A lot of these NoSQL products focused on ease of use and the perception of Scale(TM). That’s a good business strategy because everyone understands ease of use, everyone thinks they need scale but no one really does. So if your product is easier than SQL and fits with Agile processes and Javascript codebases, then you have a winner. Mongo and Redis both suffered from huge technical drawbacks compared to the standard set by SQL server software but that didn’t matter - developers adopted it like hot cakes because it made sense.

The lesson? Elegance is not simplicity and simplicity wins every time.

And another lesson - marketing is everything when you’re the underdog. Does it make technical sense to compare the scale of deployments between a transactional system like IBM’s DB2 and a non transactional system like Mongo? Of course not. But you can do it anyway and, evidently, it will work. Even today, people will say that Mongo scales because of shards and that transactions are just slowing things down. Interestingly, parts of that last sentence are actually true.

What’s missing?

All these systems were built primarily to make money by filling the niche of modern application stacks that were only somewhat patched by relational systems. Overall, I think they did a good job, we saw some interesting activity in the area and we even got some impressive developments in distributed systems and storage technology.

But I don’t think they were particularly well built. Instead of improving on fully ACID systems, they ignored them as if they were stone age tech. And I know first hand how much people hate locking and transctions - at least, until they understand that almost all of their data corruption is caused by the hoops they go through to skip basic safety checks.

I don’t know if you know this, but backups are a waste of space too. Until they’re the only thing that can save you.

There’s always room for more

What I want to see is fast transactional systems built for modern SSDs. Not for VMs, not for cloud deployments and not for analytics. I want a fully ACID system that is intuitive, simple and fast. I don’t care about lots of features or scale out. And I want to do that with a new database model because that looks like fun. And I want it to be small.

So I have started building one.

  1. Wikipedia’s history entry omits these details. ↩︎