I’ve been thinking about how you could make serverless better over the past few months. Mainly because we’ve found that the tools available to those of us who use serverless at scale (note: there aren’t very many of us) have been forced to push those tools into a role that they weren’t built for.
I’m enjoying Ben Kehoe’s posts on how he sees the future of serverless and he often gets me thinking hard about some of the problems that we face. The other day he got me thinking about Service Discovery as the missing lynchpin of Serverless.
Service Discovery as a Service: The missing serverless lynchpin
A vision for loosely-coupled, high-performance serverless architecture: Part 1
I agree although I have approached this from a different angle and with a different thought process. Here was my response to that post:
The way I see it, Serverless is not difficult to understand from a Dev point of view. Logic and code is relatively simple.
It’s much more difficult to understand from an Ops point of view.
The key for me with Serverless is Events.
And Events are an interesting problem in an Ops world.
In fact, the way that most systems are built and most systems are thought of, the Event is essentially a way of triggering something else. Within Ops, Events are generally ignored or treated as a payload carrier only.
A service produces an event…
The event goes to where it’s supposed to go…
The event gives it’s payload to the service.
This is all well and good, but after reading the posts around Service Discovery, it really got me thinking.
Events are really important… like REALLY important.
But we don’t treat events as first class citizens because we’ve mostly treated them as internal function calls only.
We treat them like we treat people that deliver post. We’d rather not know how a postal payload actually got here (because it makes you feel guilty) but we’re glad that it arrived.
(So, as a specific example, I’m going to use AWS, because that’s what I know)
Within the AWS environment, event routing is relatively simple.
AWS utilises something called an ARN — an Amazon Resource Name.
Amazon Resource Names (ARNs) and AWS Service Namespaces - Amazon Web Services
Describes the ARN formats which uniquely identify AWS resources and the AWS service namespaces.
Simply, these are addresses in the AWS cloud environment. It means I can send a payload to another service without having to think too hard.
Which means with the post analogy, that I can address a “letter” to an ARN, and it will get sent to that ARN.
But in my head, there is something really problematic and interesting about this.
The problem is that you can’t redirect a payload once it’s sent.
In the UK, when you move house, you can setup a redirect service (for a small fee) which means you can reroute your post to a different address.
What if an ARN wasn’t the fixed endpoint to a service?
What if an ARN payload could be re-routed?
What if there was a way the ARN system could be more like DNS?
ARN + DNS = ARNS?
At present, when I send a payload in AWS to an ARN, I have zero control over how that event gets sent, but certainty that it will arrive.
The event goes to the ARN.
Which is great! It’s exactly what I wanted.
But if I could decide that it goes through a “service” that routes my arn request for me, then I would be able to do some really interesting things with that.
You could reroute an event meant for one Lambda to another Lambda…
You could reroute a database write to somewhere else…
You could push data going originally to an SNS queue, into a holding SQS queue while you fix an error somewhere else…
You could create a fanout pattern, sending a single event to multiple places…
You could do canarying by having the router send events to multiple different Lambda functions (sort of a more complex fanout I suppose) with a weighting on each route (10%-90% or 50%-50% or even 30%-50%-20%)…
You could do a lot of things, without having to rewrite a bunch of code or manage the logic inside a lambda, or EC2 instance or “roll your own”…
This would be hugely useful in things like deployment. You would be able to deploy a new function and switch over to it without having to do the logic in another Lambda function or even do an in-place swap (which at scale is scary)
Oh and TTL…
If you added TTL into the scenario you could easily provide a caching mechanism to ensure that the service doesn’t just route things to the same place over and over and that it works as quickly as possible.
That’s basically how the major part of DNS works after all.
And a TTL would mean that for those sections of event routing that don’t need to change very often that you would be able to cache that route relatively easily.
So, how do you implement it?
As a simple first step...
You create an “ARN Routing” Service maybe called ARNRS or simply ARNS.
And in the ARNS, you can create a pseudo-ARN, which I’ll call a psARN (silent p … pronounced sarn).
That psARN can then create ARNs which map to another service.
e.g. I create an event psARN endpoint like this:
and within the service, I could say that this psARN maps to this Lambda function ARN
That way I would control how the events flow around my system relatively easily.
Add a TTL to that (maybe 6 hours or 48 hours or something) then essentially, this would cache the call directly with the sending service, meaning that it doesn’t need to touch this ARN Service at all within the TTL.
Of course, this means that deployment needs to be more careful, and maybe there would need to be a way of overriding the TTL with all services in an emergency, but the idea is solid.
And if you allow a psARN to map to two or more ARNs, then you could do fanout.
And if you add a weighting to those multiple ARNs behind a psARN you could do simple canarying.
It would also mean that blue/green deployment could be done outside of the services, meaning that the API Gateway or Lambda replacement shouldn’t need to do an “in place” swap of functions for an API (which is how many people do it).
The more I think about this idea…
,,,the more I like it and wish that I had it.
It echoes Ben Kehoe’s ideas around a Service Registry, but I think it might be a little simpler thinking about it more like DNS.
In fact, you may only need to implement it for Lambda functions to start with to get a lot of impact.
Essentially, an internal AWS “DNS service for ARNs” would be hugely helpful and beneficial.
It does add some complexity though, and for some scenarios it might not be good for, although I can’t think why.
But it does add some other benefits. like maybe being able to map where your services are and how they route to each other more easily.
Like being able to deploy code to the cloud in one big go, and then making it “live” in a more controlled fashion and in stages.
This isn’t a completely thought through solution, but really something to get comment on.
But I’d really like to see something like this in all Serverless solutions, simply because I think it would really help to make Events more “first class” in thinking and also to make systems more flexible for Ops people.
Because Serverless functions are the easy bit.
And because Serverless is all about the Events.
I blog about Serverless and collect them all into one post, so you can read through. That post is here: