Making sense of serverless

The now annual Serverlessconf NYC (2019) event was held in NYC in October. This was a great opportunity to assess the current state of ‘serverless’ – what it is, why it’s important, whether and how it should factor in enterprise cloud planning, and what challenges there are around the serverless space.

The content below is informed by presentations given during the conference, but opinions and conclusions are my own.

What exactly is serverless

‘Serverless’ architectures are often correlated with ‘functions-as-a-service’ (FaaS) but ‘serverless’ extends far beyond FaaS. At the simplest – and most pragmatic – level, ‘serverless’ architectures do not have any ‘servers’ (i.e., long-running virtual and/or physical machines) to manage – i.e., no OS patches, no upgrades, no capacity to manage, etc. Operational overhead related to infrastructure management is zero. This means development teams are 100% responsible for their serverless application architectures.

Of course, the servers do not disappear: they are merely 100% managed by ‘someone else’, with little or no engagement required between development teams and those who manage the serverless infrastructure.

Architecturally, serverless is more than simply no infrastructure to manage. ‘Pure’ serverless architectures exhibit the following characteristics:

Are event-driven
Are function-oriented
Use managed services
Are scalable to zero and up

It’s worth noting that managed services in a a serverless architecture may not themselves be using ‘serverless’ architectures. However, the technology used to deliver each managed service is not exposed in a specific serverless solution architecture.

Why is serverless important?

From a purely economic perspective, serverless enables the introduction of true value-driven businesses (see Simon Wardley’s perspective here), enabling a whole new economics around automation and infrastructure. Essentially, if no value is being delivered, then no cost is being incurred on a serverless architecture (assuming usage of services constitutes ‘value’ delivered).

Because of the traditionally high cost of supporting and maintaining application infrastructure, businesses have historically put a lot of effort into planning new software; new features tend to increase both change and operating costs, usually more than linearly. This makes businesses more change-averse over time, leading to over-planning, a lack of agility and significantly reduced pace of innovation delivery.

Currently (as of 2019), the fashion is to invest in cloud computing to dampen change and operating costs – although this investment is still predominantly in the IaaS space (i.e., using the cloud’s superior economics for compute, storage and networking). Enterprises moving beyond IaaS are faced with either committing to a specific cloud provider’s infrastructure PaaS solutions (such as AWS ECS, Fargate, etc), or investing a lot of effort in building and operating their own orchestration and runtime solutions (usually using Kubernetes as the enabler).

But the war to deliver customer value is being fought above the runtime or infrastructure PaaS, as suggested by Simon Wardley. The risk to many enterprises is that they win the battle for infrastructure PaaS but lose the war to deliver customer value. Indeed, one conference presentation (entitled ‘Killing Kubernetes’) gave a real-world example of how running down the Kubernetes path prematurely can cause a team to lose sight of the customer value to be delivered. (The team in the end decided to go full serverless, and ditch Kubernetes.)

Using a tool likely Wardley Maps enables clarity of thought with respect to critical platform-level components, and which battles it makes sense to fight, vs leveraging what industry innovation will provide.

As Ben Kehoe describes, the point of serverless is to provide focus on business value – it’s not about functions, technology, cost or operations.

Changing Build vs Buy Mindsets

Many enterprises have a policy of ‘buy over build’ – i.e., buy (and customize) a solution rather than build a solution. Customized off-the-shelf solutions have their advantages, but often ultimately lead to businesses being constrained by vendor roadmaps, or by the cost of upgrading/keeping pace with vendor advances. In particular, vendor software is optimized for configurability, whereas what enterprises need is extensible software rather than configurable software.

Serverless provides organizations which do not have a depth in engineering expertise with a path towards ‘build over buy’. Functions and workflows proprietary to a business can be done with serverless functions, while use of managed services can minimize the need for infrastructure expertise. Integrating 3rd party software-as-a-service solution also becomes second nature in a serverless environment, particularly with the advent of integration tools such as AWS EventBridge. Such an architecture is readily extensible, and better suited to meet enterprise needs.

Serverless and the Enterprise

Serverless architectures are being actively used by all kinds of businesses with great success (aCloudGuru, a sponsor of ServerlessConf NYC, is just one example of a successful serverless user). Architectures which rely exclusively on managed services (such as AWS S3, Aurora, Lambda, StepFunctions, DynamoDB, etc) can be considered serverless.

Many enterprises are partially serverless on the cloud, choosing to leverage a cloud provider’s managed service offering as part of a ‘serverful’ solution architecture (e.g., using S3 with EC2). But these architectures do not provide the full benefits of a true ‘serverless’ architecture, as considerable effort is still required to manage the non-serverless elements.

It should be noted as well that just because traditional enterprise software is made available as a ‘managed service’, doesn’t mean that the enterprise overhead of managing that service is reduced: if the cloud provider still exposes all the configurable aspects of the software, there will not be a significant benefit in moving to the managed service. (Microsoft enterprise applications offered on Azure seem to suffer from this affliction.)

Fundamentally, serverless is not yet ready to take on all enterprise workloads – there are many constraints and conflicts between standardizing the serverless runtimes (necessary to allow them to be managed efficiently at scale) and the customization needs of enterprises. In particular, how systems manage state with sufficient performance is likely to remain a challenge – although this is certainly solvable, as emerging architectural best practices for a serverless world establish themselves (the traditional model-view-controller model being a poor model for serverless applications).

For most enterprises, therefore, business solutions should be planned as if performant platform solutions exists, being clear on what the functions are, what the managed services are, and what platform capabilities are assumed. These can then drive further investment decisions to build out (or buy/rent) these capabilities. A ‘serverless first’ mindset is key to this.

Underpinning all these is organizational design – in particular, the concepts and ideas espoused in Team Topologies map very well to this approach.

Decomposing the Monolith

A key use case for serverless is enabling the decomposition of legacy monolithic architectures. Most enterprises do not have the skills or expertise to successfully migrate complex monolithic architectures to microservices, as this requires some skills in developing on and managing highly distributed systems. While technologies like CloudFoundry and SpringBoot go a long way towards minimizing the cognitive load for application developers, organizations require considerable investment to make these technologies available as true managed services across an enterprise.

Serverless offers a route to decompose monolithic architectures without first building out the full capabilities needed to deploy serverful microservice architectures. It allows enterprises to experiment with service-based business solutions without incurring significant or hard-to-reverse costs in infrastructure and/or skills. Once a decomposed architecture begins to prove its worth, it may be unavoidable (for now) to move to serverful microservices at the back-end to scale out, but the business value proposition should be clear by then.

Serverless Challenges

Serverless architectures have their own challenges that organizations need to be prepared to handle, which are different from the challenges that building serverful architectures have.

Key challenges exist around:

Security
Local development and testing
Debugging, tracing, monitoring and alerting
Limit Management
Resilience
Lock-in
Integration testing
Serverless infrastructure-as-code

The above are the challenges specifically raised during the conference. Other challenges may yet reveal themselves.

Security

Serverless requires a different security model than traditional infrastructure. Specifically, security for serverless centers around security of functions, security of data, and security of configuration.

Key attack surfaces for serverless are event data injection, unauthorized deployments, and dependency poisoning. In particular, over-privileged permissions present a significant surface attack area. A good list of attack surfaces is published by Palo Alto/Puresec, a sponsor of the conference.

Serverless components therefore need their own security solutions, as part of an over-arching defense-in-depth security strategy.

Local development and testing

By definition, managed services cannot be available on local (laptop/desktop) development environments, as neither are serverless runtimes such as lambda. Instead, development is expected to happen directly on the cloud, which can cause issues for developers who are periodically disconnected from the internet.

For some development teams, the ability to code and test away from the cloud is important, and in this regard, cloud providers are beginning to standardize more on OCI containers in their serverless runtimes, and allow developers to run these containers locally on their laptop, as well as on standard orchestrated environments such as Kubernetes. Azure and GCP seems to be leading the way in this space, but AWS is always improving its lambda run-times and offering developers more ways to customize them, so this may eventually lead to AWS offering the same features.

The challenge, however, will be to maintain the benefits of serverless while avoiding requiring teams end up managing containers as the new ‘servers’…a trap many teams are likely to fall into.

Debugging, tracing, monitoring and alerting

The challenges here are not unique to serverless – microservices architectures have these challenges in spades. While cloud providers typically provide managed services to assist with these (e.g, AWS X-Ray, AWS CloudWatch, etc), a rich eco-system of 3rd parties also help to address these needs.

In general, while it is possible to get by with provider-native solutions, it may be best to augment team capabilities with a vendor solution, such as Lumigo, Serverless framework, Datadog, Epsagon, etc.

Limit Management

All serverless services have limits, usually defined per account. This protects rogue applications from over-loading lambdas or managed services (such as AWS DynamoDB).

Usually, limits can be increased, but may need a service request to the cloud provider. Limits can also be imposed per account at an enterprise level (for example, via AWS Organizational Units).

It is important that the service limits are known and understood, as incorrectly assuming no limits may have a material impact on a solution architecture. While serverless solutions can scale, they cannot scale infinitely.

Resilience

Resilience for managed services is different from resilience of functions-as-a-service. Managed services need to be available at all times – but the manner and means by which such services maintain availability is generally opaque to the user. Some services may be truly global, but cloud providers tend to make managed service resilient within a specific region (through multiple availability-zones in a region), which requires solution architectures to allow for redundancy across multiple regions in the event a single region fails in its entirety. Recovery in these scenarios may not need to be 100% automated, dependending on recovery time objectives.

For functions-as-a-service (lambdas), if an invocation fails, it should be safe for the runtime to try again (i.e., idempotent processing of events). So the runtime provides most of the resilience.

However, if a lambda depends on a ‘traditional’ service (i.e., not in itself dynamically scalable), there may be resilience issues. For example, a lambda connecting to a traditional relational database via SQL may run out of available server-side connections.

Resource constraints applies to any API which is not fronting a serverless architecture. So lambdas need to ensure sufficient resilience (e.g., circuit breaker pattern) is built-in so that constraints in other APIs do not cause the lambda to fail.

Lock-in

Many enterprises are reluctant to use a particular cloud providers serverless model as they tend to be very proprietary and cloud-provider specific, and therefore moving to another cloud, or enabling a solution to run on any cloud-provider, could involve considerable re-engineering expense.

Firms which are constrained by regulatory or other drivers to avoid provider lock-in have options available. Firms can use multi-cloud serverless frameworks such as Serverless. In addition, there are vendors appearing in the multi-cloud messaging space, with vendors like TriggerMesh offering a serverless multi-cloud event bus.

Some cloud providers are making source code for their lambda services available publically – for example Google Cloud Functions and Azure Functions. Open-source serverless solutions such as Google’s Knative and OpenFaaS are also available. In addition, some vendors, such as Platform9 provide a completely independent solution for lambdas, for organizations which want to deploy lambdas internally – for example, on Kubernetes.

Other mechanisms to minimize the effect of lock-in include the use of standard OCI or docker containers to host serverless functions, which may allow containers to run in other orchestration environments without requiring significant rework. (This doesn’t really help if the container relies on external provider-specific managed services, however.)

Regardless of steps taken to avoid lock-in, some cloud providers may include managed services that may be proprietary to them: once software is built to leverage such a managed service, you have a form of lock-in (in much the same way, for example, you may be locked-in to Oracle or Microsoft databases once you commit to using proprietary features of them).

As such, focusing on avoiding lock-in is, for many firms, going to result in unnecessary complexity. It may be better to exploit a given cloud provider, and manage the business risk associated with a complete provider outage. For regulated services, however, regulators may want to ensure regulated firms are not overly concentrated in one provider.

Integration testing

Integration testing is never easy to fully automate – it is partially reason why there is so much focus on microservices, as each microservice is an independently testable and deployable component. The same applies for lambdas. But each lambda may itself depend on multiple managed services, so how to test those? An excellent piece on serverless testing by Paul Johnston describes the challenge well:

The test boundaries for unit testing a FaaS Function appears to be very close to an integration test versus a component test within a microservice approach.

In essence, because all serverless features are available through APIs, it *should* be easier to build and maintain integration tests, but for now it is still harder than it ought to be.

Serverless Infrastructure as Code

There is a growing sense of dissatisfaction with the limitations of traditional YAML-based configuration languages with respect to serverless – in particular, that the lifecycle and dependencies of serverless resources are not properly represented in existing infrastructure configuration languages. Ben Kehoe gives a flavor of the issues, but this is a complex topic likely to get more attention in the future.

Summary

The key value proposition of serverless is that it permits application developers to focus more on delivering customer value, and to spend less time dealing with infrastructure concerns such as managing servers.

The time is right for organizations to start entering the serverless mindset, and to assess business solutions in the strategic context offered by serverless – whether that means ultimately using external services or designing internal services in a serverless way.

ServerlessConf 2019 was informative, and the presentations were generally accessible to a wide audience. For many presentations, it was not necessary to be a cloud engineer to understand the content and to appreciate the potential transformational opportunities of serverless in the coming years.

I hope that in future events, a broader coalition of business strategic planners and do-ers will be in attendance. It is definitely not a Kubecon, but engineering advances made at events like Kubecon will make the serverless vision possible, while freeing serverless practitioners from the complexities of managing containers, orchestrators and servers.