Hexaview Technologies

AWS Lambda Scaling with MSK

AWS Lambda Scaling with MSK
Share on linkedin
Share on twitter
Share on facebook
How to invoke Lambda parallelly with MSK trigger?

Why we are writing this blog when they already have proper documentation on each AWS’s service, because of the highlighted text below;

So, let’s start with the introduction & requirement and then we will proceed with the demonstration.

We are from the FinTech family and the FinTech world works on a scale. Back in July 2021, We at Hexaview were working on a new project to manage millions of reconciliation and rebalance transactions.

The ask was:

  • Each recon/rebalance transaction should be immutable (a separate process that comes up, does the job, and gets killed).
  • Few million reconciliations are expected in a few minutes.
  • There could be traffic spikes (up to a few hundred requests per second).

So, if you see our requirements we needed the system which was capable of doing a million reconciliations in few minutes. Our organization is based on AWS cloud and in AWS we have multiple services like EC2, ECS, EKS, Lambda, Fargate, etc. The main problem was rebalancing transactions.

Due to high scale and time-critical recon and rebalance transactions (where traffic spikes are normal), a queue-based solution was preferred. Multiple POC’s were executed comparing the queuing system and the compute services.

We picked two Amazon services MSK and Lambda to create our system.

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming and event data.

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes. The main task was to run multiple lambda invocations in parallel based on predefined project criteria/attributes and MSK triggers.

Proposed Architecture
How does Lambda work with MSK?

In 2020 lambda started supporting MSK triggers. In simple terms, whenever a message would come into a Kafka topic, a Lambda would be triggered to take care of that message from the MSK itself.

Lambda reads the messages from the MSK in batches and provides these to your function as an event payload. The maximum batch size is configurable. (The default batch size is 100 messages.) From the immutability perspective, we kept the batch size as 1 (1 lambda = 1 rebalance)

So, At this point everything seems perfect we have MSK for queuing and lambda which comes with a lot of features like you do not need to think about infra and auto-scaling.

The result from our first attempt:

We started producing messages on MSK topic and lambda was configured with MSK trigger but the problem was lambda function was not getting invoked parallelly.  

So, we started reading AWS documents then this paragraph came into the picture “Every 15 minutes, Lambda evaluates the consumer offset lag of all the partitions in the topic. If the lag is too high, the partition is receiving messages faster than Lambda can process them. If necessary, Lambda adds or removes consumers from the topic.” From this point, we changed our approach and we were able to invoke lambda parallelly.

Note: In this blog, we are not discussing NETWORK, INFRA, DATA, or SECURITY.

Let’s start our demonstration “How to invoke lambda parallelly with MSK trigger”.

Test Bed

Lambda Configurations

  • Runtime: Python 3.6
  • Memory: 128 MB

Trigger Configuration

  • Trigger type: MSK
  • Batch size: 1
  • Starting Position: LATEST

MSK Configuration

  • Broker Type: kafka.t3.small
  • Broker Per Zone: 1
  • Number of Brokers: 2
  • 1 Lambda Invocation = 1 Recon/Rebalance immutable process

There were some variables in the test;

  • Partitions per topic: 10(case1), 20(case2), 100(case3)
  • Time taken by each lambda (3sec)

Test Case(s)

We took constant message creation rate in all scenarios.


Two simple lines of code were used for lambda function to execute this scenario and graph is showing the result:

Code Block

“Import time


Calculation and value

Message Production Rate = 2 messages per second for 2000 seconds (Total of 4000 messages sent in 2000 seconds at a consistent rate)

Message Consumption Rate = lambda rate. Each lambda took exactly 3 seconds

Number of Partitions = 10

20 * 3 = 60       …It means only one lambda executing at any point in time.

Conclusion Summary

  • For the first 15 minutes, only one Lambda runs at any point in time.
  • After 15 minutes, parallel lambdas were invoked (with Max invocation <= number of partitions).

5 messages per second, messages sent duration: 2000 seconds, number of partitions: 20

Message Production Rate = 5 messages per second for 2000 seconds

Message Consumption Rate = lambda rate. Each lambda took exactly 3 seconds


5 messages per second, messages sent duration: 2000 seconds, number of partitions: 100

Message Production Rate = 5 messages per second for 2000 seconds

Message Consumption Rate = lambda rate. Each lambda took exactly 3 seconds

So, as we can see, the entire Lambda Execution process can be split into three phases.

  • Tear Up Phase: First 15 minutes
  • Max Throughput Phase
  • Cool Down Phase: Last 15 minutes.

Number of Portfolios that can be processed in first 15 minutes = 900/ Average time taken by a Portfolio.

Maximum Number of Portfolios that can be processed per minute after 15 minutes  = Number of Partitions  60 / Average time taken by a Portfolio (This can vary as per the input message flight rate).

Calm Down Time ~ 15 minutes


We ran the test in multiple cases and conclusions are as follows:

  • During the first 15 minutes, only one lambda can run in parallel irrespective of the number of partitions in a topic.
  • Hence, the number of lambdas that can be executed in first 15 minutes = 900 / AverageTimeTakenByLambdas(sec).
  • Every 13-15 minutes, Lambda trigger revalidates the incoming message rate and increases the parallel lambdas invocations.
  • After 15 minutes, the maximum number of lambdas that can be invoked = number of partitions.

For more exciting work that Hexaview does, please get in touch with us.