Lambda function to execute an query on ATHENA and store the results back in S3
See original GitHub issueHi, Here is what I am trying to get .
- I have an application writing to AWS DynamoDb-> A Keinesis writing to S3 bucket.
- I use an ATHENA to query to the Data from S3 based on monthly buckets/Daily buckets to create a table on clean up data from S3 ( extracting required string from the CSV stored in S3).
- AWS ATHENA does not allow INSERT_INTO/INSERT_OVERWRITE to modify the table contents. Hence i am going the LAMBDA way to run a query on the ATHENA created table and store the result back to S3 which i can use to create visualizations in AWS quicksight.
- This is the Lambda function
import boto3
def lambda_handler(event, context):
query_1 = "Select REGEXP_EXTRACT(data,'[a-z]*[0-9]') as datacenter,\
REGEXP_EXTRACT(response_code,'[0-9]+') CODE, \
REGEXP_EXTRACT(pool_id,'[a-z]*[0-9]+') as TOWER,\
CASE \
WHEN response_code like '%2%' THEN '1' \
WHEN response_code like '%3%' THEN '1' \
WHEN response_code like '%4%' THEN '1' \
ELSE '0' \
END as STATUS \
FROM probe_result_v3.cwsproberesults \
WHERE pool_id like 'POOL_ID%';"
database = "xxx-xxx-xx"
s3_output = "s3://xxxx-results/"
client = boto3.client('athena')
response = client.start_query_execution(QueryString = query_1,
QueryExecutionContext={
'Database': database
},
ResultConfiguration={
'OutputLocation': 's3://xxxx-results/resultfolder/'
}
)
return response
the execution log from lambda returns success. Response:
{
"QueryExecutionId": "d8f8104f-407c-4eff-b57d-b9bbf57e5196",
"ResponseMetadata": {
"RetryAttempts": 0,
"HTTPStatusCode": 200,
"RequestId": "2e6f5d29-43b2-11e8-862c-077a4462e1c2",
"HTTPHeaders": {
"date": "Thu, 19 Apr 2018 09:15:19 GMT",
"x-amzn-requestid": "2e6f5d29-43b2-11e8-862c-077a4462e1c2",
"content-length": "59",
"content-type": "application/x-amz-json-1.1",
"connection": "keep-alive"
}
}
}
- however when i go back to the s3://xxxx-results/resultfolder/ i see nothing created.
- When i execute the query alone from ATHENA Query editor, i see the CSV created in the S3 bucket location, but then it is an on demand query and I am trying to schedule this so that i can use it in the QUICKSIGHT for an hourly graph
Please can you help me fix this.
Issue Analytics
- State:
- Created 5 years ago
- Comments:13 (1 by maintainers)
Top Results From Across the Web
Create an AWS Lambda to Query Data with Athena & Output ...
Lambda 1: Query Athena and load the results into S3 (Python). In the example below, the code instructs the Lambda to import boto3...
Read more >How to Query AWS Athena from a Lambda Function - YouTube
In this video, I show you how to submit an Athena query and retrieve the results from a Lambda Function. Become a Better...
Read more >Schedule an Athena query
To schedule an Athena query using a Lambda function and an EventBridge rule: 1. Create an AWS Identity and Access Management (IAM) service...
Read more >Run Amazon Athena's queries with AWS Lambda
We introduce how to Amazon Athena using AWS Lambda(Python3.6). ... Run query at Amazon Athena and get the result from execution. Delete s3...
Read more >Athena Query permissions from lambda, store results in s3
The Lambda's IAM Policy that you are using in your template is missing s3:ListBucketMultipartUploads . Please refer to this document for all ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thought I would chime in as I was involved in this to get this resolved. It was simplest case of not having proper IAM permissions. The Lambda role had no s3 perms and wasn’t generating an exception. When it fires
response = client.start_query_execution(QueryString = query_1, QueryExecutionContext={ 'Database': database }, ResultConfiguration={ 'OutputLocation': 's3://xxxx-results/resultfolder/' } )
As Response is not failing ( It throws a query in Athena with ResultConfiguration and assumes that the job is done) It has no way of knowing if it had actually written the output to the S3 bucket because of it being Asynchronous call. There are programmatic solutions to handle this obviously.How we found it: Error and Trial. Among few other steps which didn’t help, I replicated her setup in my Test environment and gave lambda role “Full Admin” to isolate the cause. From there we figured it out. Hope that helps. @snehamirajkar Sorry… Thought I will answer this to help the community.
Could you write the solution? or give me a hint of what you did? That would be very helpful! 😄