Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: Async BatchProcessor (use case: slow processing of each item)

See original GitHub issue

Use case

I would like to process asynchronically the messages that reach a BatchProcessor, sometimes process messages from an SQS queue depends on HTTP calls or similar that can take some time, if all are done at the same time, it would not have an accumulated delay.

Solution/User Experience

I have made a slight hack to allow this, extending the BatchProcessor class:

import asyncio
import sys
from typing import List, Tuple, Union

from aws_lambda_powertools.utilities.batch import BatchProcessor, SuccessResponse, FailureResponse


class AsyncBatchProcessor(BatchProcessor):

    async def _aprocess_record(self, record: dict) -> Union[SuccessResponse, FailureResponse]:
        """
        Process a record with instance's handler

        Parameters
        ----------
        record: dict
            A batch record to be processed.
        """
        data = self._to_batch_type(record=record, event_type=self.event_type, model=self.model)
        try:
            if self._handler_accepts_lambda_context:
                result = await self.handler(record=data, lambda_context=self.lambda_context)
            else:
                result = await self.handler(record=data)

            return self.success_handler(record=record, result=result)
        except Exception:
            return self.failure_handler(record=data, exception=sys.exc_info())

    async def aprocess(self) -> List[Tuple]:
        return list(await asyncio.gather(*[self._aprocess_record(record) for record in self.records]))

and the main code to run:

import asyncio
import json

from aws_lambda_powertools.utilities.batch import EventType
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
from aws_lambda_powertools.utilities.typing import LambdaContext

from aws_lambda_powertools import Tracer, Logger
from .async_batch_preprocessor import AsyncBatchProcessor

tracer = Tracer()
logger = Logger()

aprocessor = AsyncBatchProcessor(event_type=EventType.SQS)

a = 1


async def record_handler(record: SQSRecord):
    global a
    """
    Process here each record
    """
    payload: str = record.body
    if payload:
        item: dict = json.loads(payload)
        print(item)
        a += 1
        print(f'sleeping for {a} s')
        await asyncio.sleep(a)
        print('awaited!')
        # code code code...


@logger.inject_lambda_context
@tracer.capture_lambda_handler
def lambda_handler(event, context: LambdaContext):
    return asyncio.run(alambda_handler(event, context))


async def alambda_handler(event, context: LambdaContext):
    batch = event["Records"]
    with aprocessor(records=batch, handler=record_handler):
        await aprocessor.aprocess()  # kick off processing, return list[tuple]
    return aprocessor.response()

Alternative solutions

No response

Acknowledgment

This feature request meets Lambda Powertools Tenets
Should this be considered in other Lambda Powertools languages? i.e. Java, TypeScript

Issue Analytics

State:
Created 10 months ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

BakasuraRCEcommented, Nov 23, 2022

Please check my comment in the PR @heitorlessa 😃

1reaction

heitorlessacommented, Nov 14, 2022

That’s quite interesting!!! Back then, I had issues with async lambda handler as I noticed most customers had issues with remembering the order of decorators – some tests could help it as I don’t yet have advanced knowledge on asyncio.

At a first glance, the only challenge I see here is the use of Mixins - multiple inheritance doesn’t work nicely with compilation (we plan to use Mypyc). It has a small performance overhead due to super() too (MRO to an extent)… but truly is microseconds.

If you could create a PR with an example how customers could use and/or test, I’d love to include it for this week’s release - no rush if you can’t too.

Top Results From Across the Web

Increase Spring Batch Performance through Async Processing ...

It will scale the processing of each item by executing it in a separate ... In the asynchronous processor use case, an AsyncItemProcessor...

Issues · awslabs/aws-lambda-powertools-python - GitHub

Feature request : Async BatchProcessor (use case: slow processing of each item) feature-request feature request. #1708 opened on Nov 10 by BakasuraRCE.

Performance Considerations for IBM InfoSphere Master Data ...

We are observing slow performance in one or more use cases. What are the general IBM recommendations or best practices that we can...

Batch Processing in MuleSoft CloudHub - AVIO Consulting

Batch processing provides a construct for asynchronously processing large datasets. The batch job splits up incoming source data into ...

Batch Processing vs Real Time Processing - Comparison

Basically, there are two common types of spark data processing. Such as Batch Processing and Spark Real-Time Processing. In this blog, we will...