Feature request: Async BatchProcessor (use case: slow processing of each item)
See original GitHub issueUse case
I would like to process asynchronically the messages that reach a BatchProcessor, sometimes process messages from an SQS queue depends on HTTP calls or similar that can take some time, if all are done at the same time, it would not have an accumulated delay.
Solution/User Experience
I have made a slight hack to allow this, extending the BatchProcessor class:
import asyncio
import sys
from typing import List, Tuple, Union
from aws_lambda_powertools.utilities.batch import BatchProcessor, SuccessResponse, FailureResponse
class AsyncBatchProcessor(BatchProcessor):
async def _aprocess_record(self, record: dict) -> Union[SuccessResponse, FailureResponse]:
"""
Process a record with instance's handler
Parameters
----------
record: dict
A batch record to be processed.
"""
data = self._to_batch_type(record=record, event_type=self.event_type, model=self.model)
try:
if self._handler_accepts_lambda_context:
result = await self.handler(record=data, lambda_context=self.lambda_context)
else:
result = await self.handler(record=data)
return self.success_handler(record=record, result=result)
except Exception:
return self.failure_handler(record=data, exception=sys.exc_info())
async def aprocess(self) -> List[Tuple]:
return list(await asyncio.gather(*[self._aprocess_record(record) for record in self.records]))
and the main code to run:
import asyncio
import json
from aws_lambda_powertools.utilities.batch import EventType
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools import Tracer, Logger
from .async_batch_preprocessor import AsyncBatchProcessor
tracer = Tracer()
logger = Logger()
aprocessor = AsyncBatchProcessor(event_type=EventType.SQS)
a = 1
async def record_handler(record: SQSRecord):
global a
"""
Process here each record
"""
payload: str = record.body
if payload:
item: dict = json.loads(payload)
print(item)
a += 1
print(f'sleeping for {a} s')
await asyncio.sleep(a)
print('awaited!')
# code code code...
@logger.inject_lambda_context
@tracer.capture_lambda_handler
def lambda_handler(event, context: LambdaContext):
return asyncio.run(alambda_handler(event, context))
async def alambda_handler(event, context: LambdaContext):
batch = event["Records"]
with aprocessor(records=batch, handler=record_handler):
await aprocessor.aprocess() # kick off processing, return list[tuple]
return aprocessor.response()
Alternative solutions
No response
Acknowledgment
- This feature request meets Lambda Powertools Tenets
- Should this be considered in other Lambda Powertools languages? i.e. Java, TypeScript
Issue Analytics
- State:
- Created 10 months ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Increase Spring Batch Performance through Async Processing ...
It will scale the processing of each item by executing it in a separate ... In the asynchronous processor use case, an AsyncItemProcessor...
Read more >Issues · awslabs/aws-lambda-powertools-python - GitHub
Feature request : Async BatchProcessor (use case: slow processing of each item) feature-request feature request. #1708 opened on Nov 10 by BakasuraRCE.
Read more >Performance Considerations for IBM InfoSphere Master Data ...
We are observing slow performance in one or more use cases. What are the general IBM recommendations or best practices that we can...
Read more >Batch Processing in MuleSoft CloudHub - AVIO Consulting
Batch processing provides a construct for asynchronously processing large datasets. The batch job splits up incoming source data into ...
Read more >Batch Processing vs Real Time Processing - Comparison
Basically, there are two common types of spark data processing. Such as Batch Processing and Spark Real-Time Processing. In this blog, we will...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Please check my comment in the PR @heitorlessa 😃
That’s quite interesting!!! Back then, I had issues with async lambda handler as I noticed most customers had issues with remembering the order of decorators – some tests could help it as I don’t yet have advanced knowledge on asyncio.
At a first glance, the only challenge I see here is the use of Mixins - multiple inheritance doesn’t work nicely with compilation (we plan to use Mypyc). It has a small performance overhead due to super() too (MRO to an extent)… but truly is microseconds.
If you could create a PR with an example how customers could use and/or test, I’d love to include it for this week’s release - no rush if you can’t too.