Auto-scaling
Last updated
Last updated
Flexible auto-scaling is a significant advantage for Patchworks users - it means you don't pay for a predetermined capacity that might only be required during peak periods, such as Black Friday.
Our flexible, auto-scaling architecture gives peace of mind by allowing you to start on your preferred plan, with the ability to exceed soft limits as needed. If you require more resources, you can transition to a higher tier seamlessly, or manage overages with ease.
Auto-scaling adjusts computing resources dynamically, based on demand - ensuring efficient, cost-effective resource management that's always aligned with real-time demand. The auto-scaling process breaks down into four stages:
At Patchworks, every process flow shape has its own microservice and its own Kubernetes pod(s). The diagram below shows how this works:
Metrics for Kubernetes pods are scraped from Horizon using Prometheus. These metrics are queried by KEDA and - when the given threshold is reached - auto-scaling takes place. This process is shown below:
Prometheus JSON exporter scrapes Horizon metrics for each Core microservice count.
Prometheus scrapes metrics from the JSON exporter.
KEDA queries Prometheus, checking if any Core microservice has reached the process threshold (set to 8).
If the process threshold is reached, KEDA scales the Core microservice pod.
The Kubernetes cluster auto-scaler monitors pods and decides when a node needs to be added. A node is added if a pod needs to be scheduled and there aren't sufficient resources to fulfill the request. This process is shown below:
The Kubernetes scheduler reads the resource request for a pod and decides if there are enough resources on an existing node. If yes, the pod is assigned to the node. If no, the pod is set to a pending
state and cannot start.
The Kubernetes auto-scaler detects that a pod cannot schedule due to a lack of resources.
The Kubernetes auto-scaler adds a new node to the cluster node pool - at which point, the Kubernetes scheduler detects the new node and schedules the pod on the new node.
The system continuously monitors traffic and resource usage (CPU, memory).
When usage exceeds predefined thresholds, the auto-scaler triggers.
Additional resources/pods are deployed to handle the increased load
When demand drops, resources are reduced to optimise costs.