question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Querying DynamoDB data with Athena

See original GitHub issue

Which Category is your question related to? Custom

What AWS Services are you utilizing? API Lambda Auth

Provide additional details e.g. code snippets I have a requirement to run complex analytics over the data that we are storing in DynamoDB. Specifically, joining together data across Amplify generated tables in a multi-tenant environment… Finding top performing “factors” per account, per team and per user (amongst other more complex requirements). The specifics about how the data connects in Dynamo isn’t necessarily important, what I am struggling with is the best way to get my DynamoDB data into place for querying with Athena (and likely in the future with QuickSight and automated analysis).

There are guides out there for how to provide the results to the user via AppSync (https://aws.amazon.com/blogs/mobile/visualizing-big-data-with-aws-appsync-amazon-athena-and-aws-amplify/), but I can’t seem to find much out there to help in getting my data to S3.

So this brings me to the question(s), what method would be best, how would I go about doing it and how should I format the data in S3? The options that I am considered are the following…

  1. DynamoStream (@model backed) => Lambda => S3
  2. DynamoStream (@model backed) => Lambda => Firehose => S3
  3. Glue ETL => S3

Has anyone else gone through a similar scenario? Are there any docs out there that I’ve missed? I have reached the edge of my experience in getting to this point, so before I embark on another learning curve, I thought it would be best to get some advice.

Thanks in advance!


Just as a note, I am not interested in storing pre-calculated metrics at this point as a lot of the analysis will be exploratory at first. So doing calculations in a Lambda resolver or storing post-calculated metrics off the back of a Dynamo stream is a no-no for us right now.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
kaustavghosh06commented, Oct 22, 2020

@jonperryxlm Apologize for the late response. We don’t support ETL solutions out of the box today with the Amplify CLI and this is an interesting use case and I’ll mark this as a feature-request for our team to consider. Having said that, we do support the first half of your ask out here which is “DynamoStream (@model backed) => Lambda” integration and then in the Lambda you can choose to perform your desired ETL operation by either publishing the results to S3 for further analysis or Firehose -> S3. For managing it within the infrastructure within the CLI itself have you considered the use of custom stacks from the CLI https://docs.amplify.aws/cli/usage/customcf? Also, please let us know if you have any issues around DynamoStream (@model backed) => Lambda" integration and you can find more info about it out here - https://docs.amplify.aws/cli/usage/lambda-triggers#dynamodb-lambda-triggers

1reaction
davidbillercommented, Aug 24, 2020

@houmark

How did you solve the duplicate data problem? If I run the job more than once, the data doubles.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Amazon Athena DynamoDB connector - AWS Documentation
The Amazon Athena DynamoDB connector enables Amazon Athena to communicate with DynamoDB so that you can query your tables with SQL.
Read more >
Using Athena data connectors to visualize DynamoDB ...
Create the DynamoDB Athena Data Connector · Navigate to the Data Sources tab of the Athena console and choose "Connect data source" button....
Read more >
Amazon DynamoDB support with Amazon Athena - Dataedo
Dataedo supports connector for Amazon Athena, a query engine that allows querying various data sources on Amazon Web Services.
Read more >
DynamoDB Export to S3 and Query with Athena - Brian Pfeil
example exporting dynamodb table to S3 and then querying via athena ... Create external table in athena pointing to exported S3 data CREATE ......
Read more >
Querying Data from DynamoDB in Amazon Athena - Medium
Amazon Athena now enables users to run SQL queries across data stored in relational, non-relational, object, and custom data sources. With federated querying, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found