question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lambda function to execute an query on ATHENA and store the results back in S3

See original GitHub issue

Hi, Here is what I am trying to get .

  1. I have an application writing to AWS DynamoDb-> A Keinesis writing to S3 bucket.
  2. I use an ATHENA to query to the Data from S3 based on monthly buckets/Daily buckets to create a table on clean up data from S3 ( extracting required string from the CSV stored in S3).
  3. AWS ATHENA does not allow INSERT_INTO/INSERT_OVERWRITE to modify the table contents. Hence i am going the LAMBDA way to run a query on the ATHENA created table and store the result back to S3 which i can use to create visualizations in AWS quicksight.
  4. This is the Lambda function
import boto3


def lambda_handler(event, context):
    query_1 =   "Select REGEXP_EXTRACT(data,'[a-z]*[0-9]') as datacenter,\
                 REGEXP_EXTRACT(response_code,'[0-9]+') CODE, \
                 REGEXP_EXTRACT(pool_id,'[a-z]*[0-9]+') as TOWER,\
                 CASE \
                 WHEN response_code like '%2%' THEN '1' \
                 WHEN response_code like '%3%' THEN '1' \
                 WHEN response_code like '%4%' THEN '1' \
                 ELSE '0' \
                 END as STATUS \
                 FROM probe_result_v3.cwsproberesults \
                 WHERE pool_id like 'POOL_ID%';"
                 
    database = "xxx-xxx-xx"
    s3_output = "s3://xxxx-results/"

    client = boto3.client('athena')

    response = client.start_query_execution(QueryString = query_1,
                                        QueryExecutionContext={
                                            'Database': database
                                        },
                                        ResultConfiguration={
                                            'OutputLocation': 's3://xxxx-results/resultfolder/'
                                        }
                                        )
    return response

the execution log from lambda returns success. Response:

{
"QueryExecutionId": "d8f8104f-407c-4eff-b57d-b9bbf57e5196",
"ResponseMetadata": {
"RetryAttempts": 0,
"HTTPStatusCode": 200,
"RequestId": "2e6f5d29-43b2-11e8-862c-077a4462e1c2",
"HTTPHeaders": {
"date": "Thu, 19 Apr 2018 09:15:19 GMT",
"x-amzn-requestid": "2e6f5d29-43b2-11e8-862c-077a4462e1c2",
"content-length": "59",
"content-type": "application/x-amz-json-1.1",
"connection": "keep-alive"
}
}
}
  1. however when i go back to the s3://xxxx-results/resultfolder/ i see nothing created.
  2. When i execute the query alone from ATHENA Query editor, i see the CSV created in the S3 bucket location, but then it is an on demand query and I am trying to schedule this so that i can use it in the QUICKSIGHT for an hourly graph

Please can you help me fix this.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
warpspeed6commented, Aug 23, 2018

Thought I would chime in as I was involved in this to get this resolved. It was simplest case of not having proper IAM permissions. The Lambda role had no s3 perms and wasn’t generating an exception. When it fires response = client.start_query_execution(QueryString = query_1, QueryExecutionContext={ 'Database': database }, ResultConfiguration={ 'OutputLocation': 's3://xxxx-results/resultfolder/' } ) As Response is not failing ( It throws a query in Athena with ResultConfiguration and assumes that the job is done) It has no way of knowing if it had actually written the output to the S3 bucket because of it being Asynchronous call. There are programmatic solutions to handle this obviously.

How we found it: Error and Trial. Among few other steps which didn’t help, I replicated her setup in my Test environment and gave lambda role “Full Admin” to isolate the cause. From there we figured it out. Hope that helps. @snehamirajkar Sorry… Thought I will answer this to help the community.

1reaction
palaciosccommented, Jul 20, 2018

Could you write the solution? or give me a hint of what you did? That would be very helpful! 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Create an AWS Lambda to Query Data with Athena & Output ...
Lambda 1: Query Athena and load the results into S3 (Python). In the example below, the code instructs the Lambda to import boto3...
Read more >
How to Query AWS Athena from a Lambda Function - YouTube
In this video, I show you how to submit an Athena query and retrieve the results from a Lambda Function. Become a Better...
Read more >
Schedule an Athena query
To schedule an Athena query using a Lambda function and an EventBridge rule: 1. Create an AWS Identity and Access Management (IAM) service...
Read more >
Run Amazon Athena's queries with AWS Lambda
We introduce how to Amazon Athena using AWS Lambda(Python3.6). ... Run query at Amazon Athena and get the result from execution. Delete s3...
Read more >
Athena Query permissions from lambda, store results in s3
The Lambda's IAM Policy that you are using in your template is missing s3:ListBucketMultipartUploads . Please refer to this document for all ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found