The design patterns that govern cloud-based applications are not always discussed, until companies reach a certain scale. While there are countless design patterns to choose from, one of the biggest challenges is managing scale when it becomes necessary.
Rapid growth is a blessing and a curse for any application, offering both increased revenue but also increased technical challenges. For better scalability, there are a number of design patterns that can make any cloud-based application more fault tolerance and resistant to the problems that often come from increased traffic.
The following five cloud design patterns help developers better manage unexpected increases in throughput.
Named after the divided partitions of a ship that help isolate flooding, the bulkhead design prevents a single failure within an application from turning into a total failure. While implementing this model in the wild is not always straightforward, it is typically found in applications that may operate under degraded performance conditions.
An application that implements the partition model is designed with resilience in mind. While not all operations are possible when email or caching layers fail, with enough forethought and communication with the end user, the application can still be semi-functional.
With isolated application sections that can operate independently of each other, subsystem failures can safely reduce the overall functionality of the application without stopping everything. A good example of the partition model in action is any application that can run in “offline mode”. While most cloud-based applications require a External API To reach their full potential, fault-tolerant clients can operate without the cloud by relying on cached resources and other workarounds to ensure that the client is marginally usable.
In many applications, failure is an end state. However, in more resilient services, a failed request can potentially be returned.
The retry model, a common cloud design pattern when it comes to third-party interactions, encourages applications to expect failures. The processes that implement the retry model create fault-tolerant systems that require minimal long-term maintenance. These processes are implemented with the ability to safely retry failed operations.
The retry model is often observed in webhooks implementations. When a service tries to send a webhook to another service, that request can do one of two things:
- To succeed. If it is successful, the operation is complete.
- To fail. If this fails, the sending service may resend the webhook a limited number of times until the request is successful. To avoid overloading the target system, many webhook implementations will use incremental backoff, gradually adding delays between each request to give a failed destination time to recover before giving up.
The retry model only works when the sender and recipient know that failed requests can be resent. In the webhook example, a unique identifier for each webhook is often provided, allowing the recipient to validate that a request is never processed more than once. This prevents duplicates while still allowing the sender to suffer their own errors that could mistakenly resend redundant data.
Managing scale can be an incredibly nuanced issue in cloud-based applications, especially with processes with unpredictable performance. the circuit breaker model prevents processes from “running away” by shortening them before they consume more resources than necessary.
To illustrate how this cloud design pattern works, imagine that you have a web page that generates a report from several different data sources. In a typical scenario, this operation may take only a few seconds. However, in rare circumstances, querying the back end can take significantly longer, which takes up valuable resources. A properly implemented circuit breaker could interrupt the execution of any report that takes longer than 10 seconds to generate, preventing long-running queries from monopolizing application resources.
Queue-based load leveling
Queue-Based Load Leveling (QBLL) is a common cloud design pattern that helps address scale issues as an application grows. Rather than performing complex operations at the time of request – which adds latency to the functionality exposed by the user – these operations are instead added to a queue that is tuned to perform a more manageable number of requests within a given period of time. This design pattern is especially useful in systems where many operations do not need to show immediate results, such as sending emails or calculating aggregate values.
For example, take an API endpoint that needs to make retroactive changes to a large dataset every time it is run. Although this endpoint was designed with a certain traffic threshold in mind, a large increase in requests or rapid growth in user adoption could negatively affect application latency. By offloading this functionality to a queue-based load leveling system, the application infrastructure can more easily support the increased throughput by processing a fixed number of operations at a time.
An alternative design model to QBLL is the throttling model, which focuses on the concept of the “noisy neighbor” problem. While the QBLL model offloads excess workloads to a queue for more manageable processing, the throttling model defines and imposes limits on how often a single client can use a service or endpoint to prevent a “noisy neighbor” from negatively impacting the system for everyone. The throttling model can also complement the QBLL model, which enables managed handling of excess workloads and ensures that the queue depth does not become too full.
Thinking back to the QBLL example, let’s say the API endpoint could initially handle around 100 requests per minute before heavy work was offloaded to a queue, while an API can support a maximum throughput of approximately 10,000 requests per minute. Ten thousand is a huge jump from 100, but the queue will still only be able to support around 100 requests per minute with no noticeable impact on the end user. This means that 1,000 API requests would take about 10 minutes to be fully processed and 10,000 API requests would take almost two hours.
In a system with evenly distributed requests, each user would experience slower processing in the same way, but if one user sends all 10,000 requests, all other users will experience a two-hour delay before their workloads even hit. begin. A throttling scheme that limits all users to 1000 requests per second would ensure that no user could monopolize application resources at the expense of another user.
The 6 month rule
It can be incredibly difficult to scale a cloud-based application. Often, IT teams must choose between implementing a design model that can support application growth for an additional six months, or a design model that can support application growth for an additional six years. .
In my experience, options that fall within the six month timeframe are the most profitable. Spend a few weeks to buy yourself six months that will meet business and user needs. It’s more efficient than spending a year building a more robust system that is much harder to change.
A medium term goal is not the same as short-sighted hacks and band-aids. Careful implementation of common design patterns can support the long-term maintenance of an application while still being flexible enough to adapt to changing circumstances.