TL;DR

This document argues for a new kind of database management system, built from the ground up to support a massively decentralized Internet, consisting of a huge number of small communities.

Such a DBMS must be:

  • Built to support scale primarily through federation
  • Simple to operate and built for commodity hardware
  • Open in its code and its design, and licensed against commercial use.

The rest of the document argues for this position.

All software is political

All software is, first and foremost, a social construct, something built by people for people. Its existence is justified only as a tool that integrates with and facilitates human activity. And, just like all tools, it says things about its creator. The people the software serves, the processes it facilitates and the way it integrates are all reflections of the beliefs of the author.

What assumptions does the author make about their users? Do they charge money? How much? What languages is the documentation available in? Does the software require expensive hardware to run?

In short, who is their audience and what do they assume about them?

Questions like these reveal a lot about the author’s views, opinions and intentions, which makes software as political as any other form of speech.

DBMSs are political too

On the Internet, code loses its value rapidly. It is reverse engineered or replicated almost instantly and the creator can do little to prevent it. But data is different. Data keeps its value as long as it is kept behind a moat, and that value increases the more data is accumulated.

DBMSs are the principle means of extracting that value - mostly invinsible but the primary money maker. So, what is their political stance?

“But does it scale?” - and other euphemisms for capitalism

DBMSs available today are built to support enormous, centralized, owned datasets, at immense complexity and cost.

Their creators want their software to “scale”, a term which almost always implies the ability to efficiently process large amounts of data.

And why is that necessary? What kind of applications require virtually unlimited storage?

Ignore scientific use cases. These are big datasets that are primarily searched, not mined - a solved problem.

Data “at scale” practically means directly or indirectly extracting tracking info from user activity streams. That’s why they need hundreds of terabytes (in 2023) of transactional storage and analytics capabilities at these scales. What other insight can be found in a food ordering app or whatever the latest Silicon Valley fad is?

This surveillance go-to-market strategy is a profitable business model and the main monetization avenue for the majority of SV style startups. And at its core is the assumption that it is acceptable for any single organization to hoard unlimited amounts of data.

These days, “AI”/Machine Learning model training is having its day in the sun, but how long that will last and how it will evolve is mostly unknown. As a data storage problem it presents its own challenges, but the state of the art, at the time of writing, is relatively primitive. In any case, these datasets are just another monetization avenue for data hoarding. The difference is that until the emergence of large ML models, monetization relied on extracting mostly metadata from the social graphs of users. Machine learning tries to tap into the actual data while sidestepping thorny ownership and copyright questions. It moves the market from surveillance capitalism to data capitalism.

All of this goes against the needs of the people that use the software. Such DBMSs make people serve the company, violating the basic human rights of users, to benefit the company.

Commodity hardware is all you need

A desktop grade machine, in 2023, can host about 100TB of data, as an order of magnitude. That bound does not come from limits of storage technology, but the processing power required to efficiently manage that amount of data. Above that, deployment on large scale hardware becomes necessary.

Instead of running on real hardware however, the trend is to move the architecture of DBMSs to be “cloud native”. This, in effect, means abandoning control of all software and hardware that an application depends on. Applications live in the world of Virtual Machines and hypervisors, of fictional hardware and imaginary operatings systems without any hope of knowing how anything works. The complexity is immense, the monetary cost disproportionate and the environmental impact hard to exaggerate.

The result is opaque software that is extremely complex to administer and over engineered for the vast majority of useful applications.

But most people don’t need this scale or the complexity that comes from it. They just need software that is easy, cheap and safe to run, no matter if it is serving an online shop or a small community.

A new breed of software license

GPL style licenses are a product of the 80s, where software was trying to stand on its own as a commodity. To do that, it had to take “a view from nowhere”, courting the favor of both software engineers and profit seeking companies. This strategy was successful in commodifying software but the exchange was uneven. What we see today is that huge for profit entities take advantage of the work of open source developers, while sharing none of the profits. GPL was designed for this, and it worked.

We need a new software license, one that promotes sharing but makes hoarding difficult. A license alone cannot dismantle capitalism, of course, but it can send a strong signal. This is the hardest question in this document, because as of this writing there is no software license that is sufficiently adopted or tested.

Towards a decentralized, federated Internet of people

These principles apply to all software, backend or frontend. Enabling small use cases and working against profiteering and hoarding are essentialy the same design goal and we need software that is built to create wealth by allowing communities to interact and share information, not by enabling data accumulation. Software engineering and architecture, means of distribution, licensing, none of them alone or together can achieve this goal. They are just tools that provide guidance and a small degree of protection to the communities that build and use the software.

In the end, the main idea in this document should be understood to be the need and desire for a new era of software that is built to serve people and communities.

And that’s how software is political.