Serverless Best Practices
Within the community we’ve been debating the best practices for many years, but there are a few that have been relatively accepted for most of that time.
Most serverless practitioners who subscribe to these practices work at scale. The promise of serverless plays out mostly at both high scale and bursty workloads rather than at a relatively low level, so a lot of these best practices come from the scale angle e.g. Nordstrom in retail and iRobot in IoT. If you’re not aiming to scale that far, then you can probably get away without following these best practices anyway.
And remember that best practices are not “the only practices”. Best practices rely on a set of underlying assumptions. If those assumptions don’t fit your use case, then those best practices may not fit.
My main assumption is that everybody is building their application to be able to run at scale (even if it never ends up being run at scale).
So these are my best practices as I see them.
Each function should do only one thing
It’s about function error and scaling isolation.
Putting it another way, if you use a switch statement in your function, you’re probably doing it wrong.
A lot of tutorials and frameworks work on the basis of a big monolithic function behind a single proxy route and use switch statements. I dislike this pattern. It doesn’t scale well, and tends to make large and complex functions.
The problem with one/a few functions running your entire app, is that when you scale you end up scaling your entire application, rather than scaling the specific element.
If you have one part of your web application that gets 1 million calls, and another that gets 1 thousand calls, you have to optimise your function for the million, whilst including all the code for the thousand. That’s a waste, and you can’t easily optimise for the thousand. Separate them out. There’s so much value in that.
Functions don’t call other functions
Functions calling other functions is an anti-pattern.
There are a very few edge cases where this is a valid pattern, but they are not easily broken down.
Basically, don’t do it. You simply double your cost, and make debugging more complex and remove the value of the isolation of your functions.
Functions should push data to a data store or queue, which should trigger another function if more work is needed.
Use as few libraries in your functions as possible (preferably zero)
This one seems obvious to me.
Functions have cold starts (when a function is started for the first time) and warm starts (it’s been started, and is ready to be executed from the warm pool). Cold starts are impacted by a number of things, but the size of the zip file (or however the code is uploaded) is a part of it. Also, the number of libraries that need to be instantiated.
The more code you have, the slower it is to cold start.
The more libraries that need instantiating, the slower it is to cold start.
As an example, Java is a brilliantly performant language on a warm start on some platforms. But if you use lots of libraries, you can find it taking many many seconds to cold start. You almost certainly don’t need them and cold start performance will hinder not just on starting up but on scaling too.
As another point I’m a big believer in developers only using libraries when necessary and that means starting with none, and ending with none unless I can’t build what’s needed without one.
Things like express are built for servers, and serverless applications do not need all the elements in there. So why introduce all the code and dependencies? Why bring in superfluous code? It’s not just something that will never get run, but it could introduce a security risk.
There are so many reasons for this being a best practice. Of course, if there is a library that you have tested, know and trust, then absolutely bring it in, but the key element there is testing, knowing and trusting the code. Following a tutorial, is not the same thing.
Avoid using connection based services e.g. RDBMS
Just don’t unless you have to.
This one will get me into the most trouble. A lot of web application people will jump on the “but RDBMS are what we know” bandwagon.
It’s not about RDBMS. It’s about the connections.
Serverless works best with services rather than connections.
Services are intended to return responses to requests really rapidly and to handle the complexity of the data layer behind the service. This is of huge value in the serverless space, and why something like DynamoDB fits so well within the serverless paradigm.
To be honest, serverless people are not against RDBMS, they are against connections. Connections take time, and if you imagine a function scaling up, each function environment needs a connection, and you’re introducing both a bottleneck and a I/O wait into the cold start of the function. It is needless.
So if you have to use an RDBMS, but put a service that handles connection pooling in the middle, maybe an auto scaling container of some description simply to handle that would be great.
The biggest point to make here is that serverless architecture may well require you to rethink your data layer. That’s not the fault of serverless. If you try to reuse your current data layer thinking and it doesn’t work, then it’s probably a lack of understanding serverless architectures.
One function per route (if using HTTP)
Avoid using the single function proxy where possible. It doesn’t scale well and doesn’t help isolate issues. There are occasions where you can avoid this, e.g. where the functionality of a series of routes are tied strictly to a single table and it’s very much decoupled from the rest of the application, but that is an edge case in most applications I’ve worked in.
This adds complexity in terms of management, but it really helps in terms of isolation of errors and issues when your application scales. Start as you mean to go on.
But then, you were using some sort of configuration management tool anyway to run everything weren’t you? And you already used CI and CD tools of some sort right? You still have to DevOps with serverless.
Learn to use messages and queues (async FTW)
Serverless applications tend to work best when the application is asynchronous. This isn’t straight forward for web applications where the tendency is to do request-response and lots of querying.
Going back to the functions not calling other functions, it’s important to point out that this is how you chain functions together. A queue acts as a circuit breaker in the chaining scenario, so that if a function fails, you can easily drain down a queue that has got backed up due to a failure, or push messages that fail to a dead letter queue.
Basically, learn how distributed systems work.
With client applications with a serverless back end, the best approach is to look into CQRS. Separating out the point of retrieving data from the point of inputting data is key to this kind of pattern.
Data flows, not data lakes
In a serverless system, your data flows through your system. It can end up in a data lake, but the likelihood is that while it’s in your serverless system it is in some sort of flow. So treat all data like it is in motion, not at rest at any point.
It’s not always possible, but try to avoid querying from a data lake within a serverless environment.
Serverless requires you to rethink your data layer significantly. This is the biggest gotcha with new people coming to serverless who tend to reach for the RDBMS and fall flat not only because the scaling catches them out, but their data structures become too rigid too fast.
You will find that your flows will change as your application changes and scale will change all of it. If all you have to do is redirect a flow it’s easy. It is far harder damming a lake.
I know this point is a bit more “out there” than others, but it’s not a straight forward one to make.
Just coding for scale is a mistake, you have to consider how it scales
It is very easy to create your first serverless application, and watch it scale. If you don’t understand what you’ve done though, you can easily fall into the trap that you can with every other auto-scaling solution.
If you don’t consider your application and how it will scale then you set yourself up for problems. If you make something with a slow cold start (lots of libraries and using an RDBMS for example) and then get a spike in usage, you could end up significantly increasing concurrency of your function, and then maxing out your connections, and slowing your application down.
So, don’t just drop an application in, and then imagine that it will work the same under load. Understanding your application under load is still part of the job.
Conclusion
There are lots more things I could have put in here and this is my opinion about the things that I have to explain most to people when I talk to them.
I haven’t mentioned things like how to plan your application, or how to consider costing out an application or anything like that as that’s slightly out of scope.
Looking forward to hearing other people’s thoughts. Pretty sure I’m going to get a flood of people telling me I’m wrong about RDBMS. As with containers, I don’t hate RDBMS, but I like to use the right tools for the right jobs. Know your tools!