question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: Async BatchProcessor (use case: slow processing of each item)

See original GitHub issue

Use case

I would like to process asynchronically the messages that reach a BatchProcessor, sometimes process messages from an SQS queue depends on HTTP calls or similar that can take some time, if all are done at the same time, it would not have an accumulated delay.

Solution/User Experience

I have made a slight hack to allow this, extending the BatchProcessor class:

import asyncio
import sys
from typing import List, Tuple, Union

from aws_lambda_powertools.utilities.batch import BatchProcessor, SuccessResponse, FailureResponse


class AsyncBatchProcessor(BatchProcessor):

    async def _aprocess_record(self, record: dict) -> Union[SuccessResponse, FailureResponse]:
        """
        Process a record with instance's handler

        Parameters
        ----------
        record: dict
            A batch record to be processed.
        """
        data = self._to_batch_type(record=record, event_type=self.event_type, model=self.model)
        try:
            if self._handler_accepts_lambda_context:
                result = await self.handler(record=data, lambda_context=self.lambda_context)
            else:
                result = await self.handler(record=data)

            return self.success_handler(record=record, result=result)
        except Exception:
            return self.failure_handler(record=data, exception=sys.exc_info())

    async def aprocess(self) -> List[Tuple]:
        return list(await asyncio.gather(*[self._aprocess_record(record) for record in self.records]))

and the main code to run:

import asyncio
import json

from aws_lambda_powertools.utilities.batch import EventType
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
from aws_lambda_powertools.utilities.typing import LambdaContext

from aws_lambda_powertools import Tracer, Logger
from .async_batch_preprocessor import AsyncBatchProcessor

tracer = Tracer()
logger = Logger()

aprocessor = AsyncBatchProcessor(event_type=EventType.SQS)

a = 1


async def record_handler(record: SQSRecord):
    global a
    """
    Process here each record
    """
    payload: str = record.body
    if payload:
        item: dict = json.loads(payload)
        print(item)
        a += 1
        print(f'sleeping for {a} s')
        await asyncio.sleep(a)
        print('awaited!')
        # code code code...


@logger.inject_lambda_context
@tracer.capture_lambda_handler
def lambda_handler(event, context: LambdaContext):
    return asyncio.run(alambda_handler(event, context))


async def alambda_handler(event, context: LambdaContext):
    batch = event["Records"]
    with aprocessor(records=batch, handler=record_handler):
        await aprocessor.aprocess()  # kick off processing, return list[tuple]
    return aprocessor.response()

Alternative solutions

No response

Acknowledgment

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
BakasuraRCEcommented, Nov 23, 2022

Please check my comment in the PR @heitorlessa 😃

1reaction
heitorlessacommented, Nov 14, 2022

That’s quite interesting!!! Back then, I had issues with async lambda handler as I noticed most customers had issues with remembering the order of decorators – some tests could help it as I don’t yet have advanced knowledge on asyncio.

At a first glance, the only challenge I see here is the use of Mixins - multiple inheritance doesn’t work nicely with compilation (we plan to use Mypyc). It has a small performance overhead due to super() too (MRO to an extent)… but truly is microseconds.

If you could create a PR with an example how customers could use and/or test, I’d love to include it for this week’s release - no rush if you can’t too.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Increase Spring Batch Performance through Async Processing ...
It will scale the processing of each item by executing it in a separate ... In the asynchronous processor use case, an AsyncItemProcessor...
Read more >
Issues · awslabs/aws-lambda-powertools-python - GitHub
Feature request : Async BatchProcessor (use case: slow processing of each item) feature-request feature request. #1708 opened on Nov 10 by BakasuraRCE.
Read more >
Performance Considerations for IBM InfoSphere Master Data ...
We are observing slow performance in one or more use cases. What are the general IBM recommendations or best practices that we can...
Read more >
Batch Processing in MuleSoft CloudHub - AVIO Consulting
Batch processing provides a construct for asynchronously processing large datasets. The batch job splits up incoming source data into ...
Read more >
Batch Processing vs Real Time Processing - Comparison
Basically, there are two common types of spark data processing. Such as Batch Processing and Spark Real-Time Processing. In this blog, we will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found