Skip to content

Autoscaling Kubernetes Workloads on AWS EKS with KEDA

The client operated a microservices platform on AWS EKS, with standard Kubernetes Horizontal Pod Autoscaling (HPA) based on CPU and memory. While this model worked well for synchronous services, one document processing service presented a challenge:

kubernetes-EKS

Client Overview

Merciv, Inc. is an AI and data analytics company that transforms strategic business intelligence workflows for the retail industry and consumer packaged goods (CPG) brands. At the core of their offering is the Evolving Virtual Analyst (EVA), an AI-driven platform that breaks down data silos by ingesting and unifying information from multiple sources. EVA enables users to simply ask business questions in plain language and receive data-driven answers with intelligent visualisations, empowering them to optimise workflows and discover new growth opportunities.

Understanding their clients' need for the highest levels of reliability, Merciv maintains a comprehensive security program with detailed policies that protect both company and customer data. Merciv's robust security framework includes least-privilege access management with MFA, stringent change management and vulnerability remediation, and secure development practices integrated throughout the product lifecycle. Through Merciv's solutions, customer organisations can extract maximum value from their data and scale their operations like never before.

merciv-logo

Challenge

Standard autoscaling methods based on CPU and memory usage were not sufficient for the client’s asynchronous workloads. To maintain performance under unpredictable load, a more event-driven approach was required.

Asynchronous Workload Pattern

Incoming tasks were queued before processing, leading to a delay between workload arrival and resource consumption.

Delayed Autoscaling Reactions

Kubernetes HPA reacted only after CPU/memory usage increased—by which time service bottlenecks had already occurred.

Burst Load Sensitivity

Large, sudden spikes in document processing requests could overwhelm the system before autoscaling caught up.

Operational Constraints

Any solution needed to integrate seamlessly without introducing architectural complexity or requiring a full-scale redesign.

Solution

To bridge the gap between workload demand and scaling responsiveness, Bion Consulting implemented Kubernetes Event-Driven Autoscaling (KEDA) for the queue-driven service, while retaining HPA for the rest of the system.

1. Event-Driven Scaling with KEDA

KEDA enabled autoscaling based on queue length instead of waiting for CPU/memory load. This allowed the service to scale when messages arrived—not after they piled up.

2. Hybrid Autoscaling Architecture

We designed a hybrid scaling strategy:

  • Queue length as the primary trigger for scaling
  • CPU/memory metrics as guardrails
  • Min/max replica counts to ensure platform stability

3. Minimal Operational Overhead

By isolating KEDA to a single service and leaving other services on HPA, we avoided unnecessary operational burden or architectural complexity.  This targeted integration allowed the document processing service to scale in alignment with actual workload ingress.

Results

The introduction of KEDA brought measurable improvements to the platform’s performance and reliability:
054-timer

Faster Autoscaling Response

The system scaled proactively based on queue volume, not reactively on resource load.

099-time management

Improved Queue Processing Times

Backlogs were cleared more efficiently during peak demand, enhancing throughput.

024-project

More Predictable Resource Usage

Scaling matched real workload patterns, preventing overprovisioning and underutilisation.

004-architecture

Reduced Complexity

The hybrid model allowed tailored scaling per service without changing the overall architecture.

This intelligent scaling approach ensured high system performance under varying loads—without overengineering.

Technology Stack

To implement event-driven autoscaling for asynchronous services, the following technologies were used:

  • Cloud Platform: AWS EKS, EC2
  • Container Management: Kubernetes, Helm
  • Autoscaling:  Kubernetes HPA, KEDA
  • Event Source: Message Queue-based Trigger
  • Monitoring & Metrics: Kubernetes-native metrics and queue backlog indicators

By combining KEDA with traditional HPA, Bion enabled a smarter, more adaptive scaling strategy aligned with real-world demand.

Ready to scale smarter with Kubernetes on AWS? Book a consultation with our experts.