I’ve been engaged by a friend for a small project. It was a relatively simple one, where the company had issues with a system that had been developed. The system is typical of some that I have seen built.
The development is primarily a front end (something like vue.js or React) and with a back end system built on a node.js base using express or koa and a database.
Simple. Nothing complicated there. Except there was…
Despite being hosted on AWS, and the discussion about being “in the cloud” the system was all hosted on a single EC2 instance… including the database… and all the backups.
Now, this is not a criticism of the original development team. I have seen this many times from many different companies. If you are not aware of how to build in the cloud, and you simply follow tutorials online of how to setup a server then this is essentially what you can end up with.
The problems were simple:
- if the EC2 instance (for any reason) fails, the data backups disappear 😱
- the security group had port 22 open to the world 😱
- the EC2 instance is not even behind a load balancer
- Basically, after that there was no point looking much further
It was enough for me, on the first look, to basically say “can I fix this now please?” and worry constantly until they said yes
It may seem trivial to say it but the solution in black and white is relatively simple:
- Move the database (postgres) to RDS
- Change the code to using this new database
- Change the EC2 Instances to load balanced and auto scaling
Unfortunately, with an existing and running site, this is not trivial. The database side of things took longer than expected.
When things aren’t built for the system you’re putting them into…
RDS took longer than I wanted.
Don’t get me wrong. Setting up an RDS database is easy. That took minutes.
Testing it with an existing codebase that exists only on a single server is hard.
And here’s the thing: the codebase was written to expect to run only on a single server, and with the database on the same server.
And that was a problem because just changing the database server over worked, but only partly.
There was it turns out, test and live data, but switching over required the right data in the right place, and apparently (I wasn’t aware until afterwards) the wrong data had gone into the test database, and switching it over caused a race condition on the server, and took it down.
The other issues were all the customisations that had been put into the database. Transferring over all of those, including user permissions into the new RDS database was relatively hard.
Transferring and testing that was hard as well, as the system wasn’t built with a relatively robust set of tests.
And that’s the thing. Nobody had built it with the thought that there might be a need in the future to ever do something like this.
I naively thought that since most people in the last 20 years have experience of building 3-tier architectures, that this would be easy
Once the switch had taken place though, the system worked significantly better, and the fact that the data is encrypted, backed up, and somebody else’s problem is a significant weight off my mind!
When it came to the moving of the single server to the load balanced/auto scaled approach I changed the approach to one of teaching the team.
Show them the right way instead
Instead of doing it for them, I took the approach of showing them how to do it right, so I wrote a PDF with screenshots of how to do it.
It seems clear to me that if someone isn’t comfortable building solutions for the cloud, that expecting them to take on the ownerships of a fully auto scaling solution without understanding is a bad idea.
I could have put it into Elastic Beanstalk or similar but that would have required them to consider a rebuild and I didn’t want to introduce something else into the mix.
So this walkthrough explained how to put together EC2 Launch Templates, EC2 Auto Scaling Groups, ELB, Code Deploy and their existing GitHub repo to easily deploy their existing codebase using their RDS instance to create a simple auto scaling system.
This may sound complicated, but they already understand EC2 to a point, and GitHub. What they need help with is a simple deployment process and a simple and manageable scaling system without changing too much of their processes.
Once setup, they don’t change what they do in terms of day to day very much apart from:
- Add an AppSpec.yml file so that CodeDeploy knows what to change on the servers
- push changes to GitHub
- Manually create a deployment in CodeDeploy
This manual step could become automated at a later date, as needed. While many in the DevOps space will look at this and say this is slow, the key for me is that I’m taking the team on a step by step journey from where they actually are and not trying to make them “DevOps Legends” in one step.
Give me my Serverless!!!
I haven’t done a migration project like this in a long time. I had to ask for help from a number of friends who do things like this more regularly than me to ask what the right kind of technologies to use nowadays are.
It sounds strange, but this is not unusual. I’ve found companies thinking like this everywhere.
On Twitter, I talk to Serverless people all the time, about how we should be approaching the cutting edge of the cutting edge and I often forget that there are many out there who live nowhere near those conversations.
And yes I can absolutely see how the above company could have done this whole solution better as a Serverless solution but they don’t have the money for rearchitecting their back end (I don’t imagine) and what would be the value anyway? It’s up and running, with paying clients. The value at this point doesn’t seem valuable. Additional features may be a good fit for a Serverless approach, but not the whole thing if it’s all working.
The pain of migrating to a new backend database, the pain of server migrations even at this level of simplicity, the pain of having to coordinate with other teams on something that seems so trivial, but never is that trivial has been really hard.
I’ve got used to the Serverless way of things. While migrating data is never easy, while it is not trivial with an event driven approach, it is certainly an awful lot easier to think about and plan for than even something like this story shows. You are automatically distributed, automatically event driven and this forces you to think differently about data layers.
However, I also think that the wider tech community needs to do it’s part as well:
- to start to deprecate bad tutorials that show ways of doing things that we shouldn’t be doing any more
- to start to provide better context e.g. “if you’re on AWS EC2, this article is the more up-to-date way of doing it”
- to stop saying things are easy when they are not
- to stop thinking that “hello world” is all that matters with frameworks/libraries/OSS
- to stop telling everybody that Open Source is always best, because nothing is always best
We have a duty to help the next generations to build better solutions. If they build the solutions of yesterday on the tech of tomorrow, then no wonder we’re going to have to fix their issues.
Serverless is the future.
Of that I have no doubt.