Autoscaling Kubernetes Workloads on AWS EKS with KEDA

The client operated a microservices platform on AWS EKS, with standard Kubernetes Horizontal Pod Autoscaling (HPA) based on CPU and memory. While this model worked well for synchronous services, one document processing service presented a challenge:

Client Overview

Merciv, Inc. is an AI and data analytics company that transforms strategic business intelligence workflows for the retail industry and consumer packaged goods (CPG) brands. At the core of their offering is the Evolving Virtual Analyst (EVA), an AI-driven platform that breaks down data silos by ingesting and unifying information from multiple sources. EVA enables users to simply ask business questions in plain language and receive data-driven answers with intelligent visualisations, empowering them to optimise workflows and discover new growth opportunities.

Understanding their clients' need for the highest levels of reliability, Merciv maintains a comprehensive security program with detailed policies that protect both company and customer data. Merciv's robust security framework includes least-privilege access management with MFA, stringent change management and vulnerability remediation, and secure development practices integrated throughout the product lifecycle. Through Merciv's solutions, customer organisations can extract maximum value from their data and scale their operations like never before.

To bridge the gap between workload demand and scaling responsiveness, Bion Consulting implemented Kubernetes Event-Driven Autoscaling (KEDA) for the queue-driven service, while retaining HPA for the rest of the system.

1. Event-Driven Scaling with KEDA

KEDA enabled autoscaling based on queue length instead of waiting for CPU/memory load. This allowed the service to scale when messages arrived—not after they piled up.

2. Hybrid Autoscaling Architecture

We designed a hybrid scaling strategy:

Queue length as the primary trigger for scaling
CPU/memory metrics as guardrails
Min/max replica counts to ensure platform stability

3. Minimal Operational Overhead

By isolating KEDA to a single service and leaving other services on HPA, we avoided unnecessary operational burden or architectural complexity. This targeted integration allowed the document processing service to scale in alignment with actual workload ingress.

Technology Stack

To implement event-driven autoscaling for asynchronous services, the following technologies were used:

Cloud Platform: AWS EKS, EC2
Container Management: Kubernetes, Helm
Autoscaling: Kubernetes HPA, KEDA
Event Source: Message Queue-based Trigger
Monitoring & Metrics: Kubernetes-native metrics and queue backlog indicators

By combining KEDA with traditional HPA, Bion enabled a smarter, more adaptive scaling strategy aligned with real-world demand.

Ready to scale smarter with Kubernetes on AWS? Book a consultation with our experts.

Autoscaling Kubernetes Workloads on AWS EKS with KEDA

Client Overview

Challenge

Asynchronous Workload Pattern

Delayed Autoscaling Reactions

Burst Load Sensitivity

Operational Constraints

Solution

1. Event-Driven Scaling with KEDA

2. Hybrid Autoscaling Architecture

3. Minimal Operational Overhead

Results

Faster Autoscaling Response

Improved Queue Processing Times

More Predictable Resource Usage

Reduced Complexity

Technology Stack