Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: Data Class for Cloudwatch Logs passed through Kinesis Stream

See original GitHub issue

Use case

We are currently passing Cloudwatch logs to a Kinesis stream to be subsequently processed by a Lambda. We’d like to use the cloud_watch_logs_event data class but it only works when the source is Cloudwatch directly, not Kinesis. It seems when Cloudwatch targets Lambda, it wraps the payload in the following superstructure: 'awslogs': {'data': {<PAYLOAD>}}

You can see that the existing CloudWatchLogsEvent class’ raw_logs_data property expects this structure to unpack the data.

This superstructure doesn’t show up when the logs are passed through kinesis

Solution/User Experience

I think a new method/property within the CloudWatchLogsEvent class, perhaps named parse_logs_kinesis() or something, which can be used to unpack this type of payload, could work.

However that may be less than ideal since accidentally using the existing raw_logs_data property would then cause issues, so it’s possible that an entirely new class (like CloudWatchLogsKinesisEvent, ex) would be preferable.

A third option would be to rewrite the existing raw_logs_data property, such that it checks for the presence of the Cloudwatch->Lambda superstructure: return self['awslogs']['data'] if self.get('awslogs') else self[0] (safe retrieval from nested dicts can get a bit klunky)

Note: Originally posted in discussions forum here.

Alternative solutions

No response

Acknowledgment

This feature request meets Lambda Powertools Tenets
Should this be considered in other Lambda Powertools languages? i.e. Java, TypeScript

Issue Analytics

State:
Created a year ago
Comments:21 (12 by maintainers)

Top GitHub Comments

2reactions

blewintercommented, Sep 7, 2022

Hi! Apologies, I wasn’t receiving notifications on this thread. I’d love to contribute! Will just need a bit of time to get set up. Do you have thoughts on what the best approach might be, or should I post on Discord?

1reaction

blewintercommented, Oct 12, 2022

I don’t have bandwidth for a full pr, but would essentially be like this

from aws_lambda_powertools.utilities.data_classes.cloud_watch_logs_event import CloudWatchLogsEvent, CloudWatchLogsDecodedData
from aws_lambda_powertools.utilities.data_classes.kinesis_stream_event import KinesisStreamRecordPayload
from aws_lambda_powertools.utilities.data_classes.common import DictWrapper


class CloudWatchKinesisLogEvents(DictWrapper):

    def __init__(self, kinesis_payload: KinesisStreamRecordPayload):
        payload = kinesis_payload.data_as_bytes()
        self.wrapped = {"awslogs": {"data": payload}}

    @property
    def raw_logs_data(self) -> str:
        """The value of the `data` field is a Base64 encoded ZIP archive."""
        return self.wrapped

    def parse_logs_data(self) -> CloudWatchLogsDecodedData:
        return CloudWatchLogsEvent(self.wrapped).parse_logs_data()

So the usage would be essentially identical, except a KinesisStreamRecordPayload is passed when initalized. I think that explicitness is useful in this case, but it does also add a dependency, so I think it’s just a preference thing which approach is better. (Btw I’d think in real-world usage, someone would write a handler for a KinesisStreamEvent, unpack the records, and pass each payload to create this object)