question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Athena query on a specific "Catalog id"

See original GitHub issue

It is not possible to specify the catalog id in the read_sql_query function. I would like to be able to request a specific catalog id on Athena.

boto3 takes into account the catalog id parameter in the “QueryExecutionContext” argument since at least version 1.14.58 (here).

A solution could be to add this in the athena._utils._start_query_execution here, something like:

    # database
    if database is not None:
        args["QueryExecutionContext"] = {"Database": database, "Catalog": catalog_name}

and have the catalog id parameter available from read_sql_query.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
igorborgestcommented, Sep 21, 2020

Actually this was a pretty useful comment. We MUST update our documentation and also improve our exception message to clarify this new ctas_approach=True limitation.

Thank you @Xiangyu-C !

1reaction
igorborgestcommented, Sep 21, 2020

Wrangler has two ways to run queries on Athena and fetch the result as a DataFrame:

ctas_approach=True (Default)

Wraps the query with a CTAS and then reads the table data as parquet directly from s3.

PROS:

  • Faster for mid and big result sizes.
  • Can handle some levels of nested types.

CONS:

  • Requires create/delete table permissions on Glue.
  • Does not support timestamp with time zone
  • Does not support columns with repeated names.
  • Does not support columns with undefined data types.
  • A temporary table will be created and then deleted immediately.

ctas_approach=False

Does a regular query on Athena and parse the regular CSV result on s3.

PROS:

  • Faster for small result sizes (less latency).
  • Does not require create/delete table permissions on Glue
  • Supports timestamp with time zone.

CONS:

  • Slower (But stills faster than other libraries that uses the regular Athena API)
  • Does not handle nested types at all.

Reference

Read more comments on GitHub >

github_iconTop Results From Across the Web

Athena query on a specific "Catalog id" · Issue #392 - GitHub
I would like to be able to request a specific catalog id on Athena. boto3 takes into account the catalog id parameter in...
Read more >
Querying AWS Glue Data Catalog - Amazon Athena
Use Athena to query metadata in Data Catalog. ... extract metadata information for specific databases, tables, views, partitions, and columns from Athena.
Read more >
create-data-catalog — AWS CLI 2.9.7 Command Reference
Creates (registers) a data catalog with the specified name and properties. ... The GLUE type takes a catalog ID parameter and is required....
Read more >
Query cross-account AWS Glue Data Catalogs using ... - Noise
In May 2021, Athena introduced the ability to query Data Catalogs across multiple AWS accounts, enabling you to access your data lake ...
Read more >
Amazon Athena - Census Docs
Census database in your AWS Glue Data Catalog: This database stores all Census ... For Athena Permissions, Census needs to be able to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found