Actually I love RDBMS (Relational Database Management Systems). I have used them for many many years, and have got a huge amount of value out of them in various projects.
I’ve learned all the skills around normalising data and then denormalising to optimise the database for various use cases.
I’ve spent time working out how best to ensure that our systems get access to the right data at the right time.
I really love RDBMS in so many ways (except I don’t really like Oracle).
But there is an issue with RDBMS in my view
“Always use RDBMS”
Over the last 15 years, many different technologies have sprung up around RDBMS. One of the most successful use cases for the RDBMS is the backend to a website.
And the web frameworks have built ORMs to make it easy to couple a system’s object model to it’s data model.
Which isn’t very good use of an RDBMS, but data model/ORM coupling works out ok a lot of the time for smallish projects.
And there are many frameworks that make it simple to do.
In fact, many frameworks don’t even require you to use an RDBMS as the backing to your object model.
But the majority of people building these systems still do this.
And it isn’t very good.
In the context of ORMs/frameworks RDBMS become a more hidden part of the process, an abstracted problem.
When something quite complex becomes abstracted, then what tends to happen is that the complexity is lost.
Use the right tool for the job
So, just to be absolutely clear, there are times when an RDBMS is exactly the right tool for the job.
However, the problem as I see it is that it works particularly well in specific contexts.
One context it works brilliantly in, is a long-lived server based context, where you can do things like decent connection pooling.
We have had that context for most of the solutions delivered in the last 15 years. Servers have been ubiquitous and as far as a data workhorse for these systems goes, RDBMS has absolutely been the king for most of that time.
However, one context where RDBMS does not have as much value is in a scenario where a lot of short-lived connections are made over and over again.
Because you end up with either the overhead of connecting and disconnecting (slow), or you end up with the overhead of connections not being dropped and the server reaching it’s maximum connection limit.
And FaaS (as has been put forward by most of the vendors) is exactly that type of scenario.
And that makes the data storage and retrieval question a bit (a lot!) more complex.
Not can but should
Whenever someone in my team suggests we use an RDBMS I push back as hard as possible. It isn’t that I think they are bad, but that they are built for a purpose and our underlying systems lowers the validity of that choice.
Of course you can use an RDBMS with FaaS, but the question is not can you but should you?
There is a tendency when developing to need a “datastore” and the first thought is “let’s use an RDBMS” because that’s a safe place for a developer to go.
And RDBMS has hidden a lot of the complexity around how we should store and retrieve data for many years.
I think developers of Serverless systems need to go back (forward?) and learn their data driven design.
The easiest way of explaining this is with an example.
Amazon had a problem in 2004 with spikes in their holiday season traffic causing outages. The commercial offerings were not delivering what they needed, so they addressed this with developing new technology in house.
Amazon Takes Another Pass at NoSQL with DynamoDB - ReadWrite
Amazon's Dynamo paper (PDF) is the paper that launched a thousand NoSQL databases, if you'll pardon a twisted metaphor…
So they built a data storage solution that gave them reliable storage of data with predictable latency even at scale.
They sacrificed their ability to deliver data blindingly fast, with an ability to deliver highly distributed and reliable data with a known and constent latency.
That’s why DynamoDB exists (that’s the theory anyway).
DymamoDB is a technology built for scale. It is inherent in the way it’s coded and utilised.
In other words, if there is any way your solution may reach any sort of scale, then DynamoDB or an equivalent is probably a good choice.
Knowing your tech
So as a developer building a solution, one of the things you have to now consider is data storage at scale.
And unfortunately, that almost certainly means not using RDBMS by default.
RDBMS can be used as a part of the solution, but I would argue strongly that it should not be your major data solution for the scalable element of your system.
And when you’re talking about web systems, especially B2C or PaaS systems, then your business model is almost certainly about attempting to deliver scale.
And if your business model has scale as a central element, then you’d better be very certain that RDBMS fits your model before you start building, because migrating away from RDBMS is far more painful than migrating towards it.
Developers cannot (and should never have) get away with “RDBMS will do for now” any more. We are beyond that, and scale can happen far faster than it used to.
I still love RDBMS. But I love RDBMS like you love an old dog that can faithfully go on a walk with you, but no longer does the tricks that younger dogs can.
Know your tools. Know your tech. Learn better data design. Pick an RDBMS when it is the right thing to use. Don’t pick an RDBMS because that’s what you know.
I have spent a long time weaning myself off RDBMS for scalable systems. It’s been one of the most valuable lessons for Serverless systems that I’ve learned (alongside not needing servers).