question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Athena Create Table

See original GitHub issue
  • I’m submitting a …

    • 🪲 bug report
    • 🚀 feature request
    • 📚 construct library gap
    • ☎️ security issue or vulnerability => Please see policy
    • ❓ support request => Please see note at the top of this template.
  • What is the current behavior? Athena Cfn and SDKs don’t expose a friendly way to create tables

  • What is the expected behavior (or behavior of feature suggested)? I’d propose a construct that takes

  • bucket name

  • path

  • columns: list of tuples (name, type)

  • data format (probably best as an enum)

  • partitions (subset of columns)

Then uses the AWS SDK Custom Resource on the Athena SDK to execute

querystring = """
  CREATE EXTERNAL TABLE IF NOT EXISTS mydb.table_name (
    `columns[0][0]` columns[0][1],
    `columns[1][0]` columns[1][1]
    etc...
  )
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES (
    {data_format}
  ) LOCATION 's3://{bucket_name}/{path}'
  TBLPROPERTIES ('has_encrypted_data'='false');
"""
response = client.start_query_execution(
    QueryString=querystring,
    ClientRequestToken='string',
    QueryExecutionContext={
        'Database': 'string'
    },
    ResultConfiguration={
        'OutputLocation': 'string',
        'EncryptionConfiguration': {
            'EncryptionOption': 'SSE_S3'|'SSE_KMS'|'CSE_KMS',
            'KmsKey': 'string'
        }
    },
    WorkGroup='string'
)
  • What is the motivation / use case for changing the behavior or adding this feature? Athena is the goddess of wisdom and civilization, how can we be a civilized developer tool if we don’t support her?

  • Please tell us about your environment: All of them

  • Other information ❤️

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:15
  • Comments:16 (3 by maintainers)

github_iconTop GitHub Comments

9reactions
hgrgiccommented, Jun 24, 2021

I have recently noticed that this post is still open while looking for approaches to a similar problem.

Both @shwetajoshi601 and @tkblackbelt suggest good approaches. What I found to work best for me was to rely on the concept of Escape Hatches in CDK where you can easily extend CDK code with custom overrides that are not yet supported in L2 constructs.

In this case you could:

  1. Follow @shwetajoshi601 approach in the CDK definition to extract the maximum from the CDK L2 construct.
  2. Extend that object by adding custom property overrides via CDK EscapeHatch code like:
CfnTable cfnTable = (CfnTable) athenaTable.getNode().getDefaultChild();
cfnTable.addPropertyOverride("TableInput.Parameters.projection\\.enabled", true);
cfnTable.addPropertyOverride("TableInput.Parameters.projection\\.year\\.type", "injected");
cfnTable.addPropertyOverride("TableInput.Parameters.projection\\.month\\.type", "injected");
cfnTable.addPropertyOverride("TableInput.Parameters.projection\\.day\\.type", "injected");
cfnTable.addPropertyOverride("TableInput.Parameters.storage\\.location\\.template", String.format("s3://%s/${year}/${month}/${day}", inputBucket.getBucketName()));

My example is in Java, but Escape hatches are supported for all implementations.

6reactions
tkblackbeltcommented, Dec 30, 2020

Found another workaround. You can do this using the CFN Object instead of the CDK one as shown below.

https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-athena-supports-partition-projection/

new glue.CfnTable(this, "TestTable", {
            databaseName: database.databaseName,
            catalogId: "ACCOUNT_ID",
            tableInput: {
                name: "test",
                tableType: "EXTERNAL_TABLE",
                parameters: {
                    "projection.enabled": "true",
                    "projection.version.type": "enum",
                    "projection.version.values": "v2",
                    "projection.dataset-date.type": "date",
                    "projection.dataset-date.range": "2020-12-18,NOW",
                    "projection.dataset-date.format": "yyyy-MM-dd",
                    "projection.dataset-date.interval": "1",
                    "projection.dataset-date.interval.unit": "DAYS",
                    "projection.something.type": "enum",
                    "projection.something.values": "1,2,3,4,5,6,7",
                    "projection.recommend.type": "enum",
                    "projection.recommend.values": "true,false",
                    "storage.location.template": "s3://bucket/version=${version}/dataset-date=${dataset-date}/something=${something}/recommend=${recommend}/"
                },
                storageDescriptor: {
                    columns: [
                        {
                            "name": "something",
                            "type": "string"
                        },
                        {
                            "name": "something",
                            "type": "string"
                        },
                        {
                            "name": "something",
                            "type": "string"
                        },
                        {
                            "name": "something",
                            "type": "string"
                        },
                        {
                            "name": "something",
                            "type": "array<string>"
                        },
                        {
                            "name": "something",
                            "type": "array<string>"
                        },
                    ],
                    serdeInfo: {
                        serializationLibrary: "org.apache.hive.hcatalog.data.JsonSerDe"
                    },
                    location: "s3://your_bucket",
                    inputFormat: "org.apache.hadoop.mapred.TextInputFormat",
                    outputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
                },
                partitionKeys: [
                    {
                        "name": "version",
                        "type": "string"
                    },
                    {
                        "name": "dataset-date",
                        "type": "date"
                    },
                    {
                        "name": "something",
                        "type": "string"
                    },
                    {
                        "name": "recommend",
                        "type": "string"
                    }
                ]
            }
        })```
Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating tables in Athena - AWS Documentation - Amazon.com
To create a table using the Athena create table form · In the query editor, next to Tables and views, choose Create, and...
Read more >
CREATE TABLE AS - Amazon Athena - 亚马逊云科技
Creates a new table populated with the results of a SELECT query. To create an empty table, use CREATE TABLE. For additional information...
Read more >
Three ways to create Amazon Athena tables - Better Dev
With tables created for Products and Transactions, we can execute SQL queries on them with Athena. Athena supports not only SELECT queries, but ......
Read more >
Create External Table in Amazon Athena Database ...
First select the Athena database name where you want to create a new table. Give a name to your new Athena database table...
Read more >
3. AWS Athena - Creating tables and querying data - YouTube
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found