question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing classification-parameter when creating table in Glue

See original GitHub issue

Hey. I haven’t reported bugs before, so I hope I’m doing things correctly here.

When creating Glue table using aws_cdk.aws_glue.Table with data_format = _glue.DataFormat.JSON classification is set to Unknown. Querying the table fails.

Reproduction Steps

glue_table = _glue.Table(self,'GlueTable'
            ,database = _glue.Database.from_database_arn(self, 'GlueDatabase'
                ,'arn:aws:glue:region:{}:database/abc'.format(accound_id)
            )
            ,table_name = 'def_ghi'
            ,data_format = _glue.DataFormat.JSON
            ,bucket = s3_bucket
            ,s3_prefix = 'prefix/'

If I manually add “classification” with value “json” in the Table properties, after deploying with CDK, the query works fine.

Error Log

Amazon Invalid operation: Invalid DataCatalog response for external table “abc”.“def_ghi”: Cannot deserialize table. Missing mandatory field: Parameters in response from external catalog. ;

Environment

  • CLI Version :
  • Framework Version: 1.37.0
  • OS :Windows 10
  • Language :Python

This is 🐛 Bug Report

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:4
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

4reactions
vuchetichbalintcommented, Jun 23, 2021

It is still a thing, is there any update on this?

2reactions
jorgenfrolandcommented, May 6, 2020

To get around this I have added a post-deploy code snippet using boto3 to update the table, like this:

response = glue_client.get_table(
    DatabaseName=database_name,
    Name=table_name
)
table = response['Table']
table['StorageDescriptor']['SerdeInfo']['Parameters'] = {}
table['Parameters']['classification'] = 'json' <-- not necessary, but removes the classification: Unknown
glue_client.update_table(
    DatabaseName=table['DatabaseName']
    ,TableInput={
        'Name' : table['Name']
        ,'Description': table['Description']
        ,'Retention': table['Retention']
        ,'StorageDescriptor': table['StorageDescriptor']
        ,'TableType': table['TableType']
        ,'Parameters': table['Parameters']
    }
)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Working with tables on the AWS Glueconsole - AWS Glue
Classification. A categorization value provided when the table was created. Typically, this is written when a crawler runs and specifies the format of...
Read more >
Creating a Glue Data Catalog Table within a Glue Job
To create a table in Data Catalog following code can help: ... partitions, parameters, location, serdeInfo, hiveCompatible) glueContext.
Read more >
Find Answers to AWS Questions about AWS Glue | AWS re:Post
I have manually created an lake formation tag key :classification with tag value :non pii and associated to tag to table columns,here i...
Read more >
Getting Started with Data Analysis on AWS using AWS Glue ...
You can even customize Glue Crawlers to classify your own file types. ... and creating table definitions in the AWS Glue Data Catalog....
Read more >
Solutions to AWS Glue Errors - Medium
Scenario 3: While running Glue Crawler. Error: You see the message “1 table has been created” on the successful execution of the crawler,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found