Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Athena query on a specific "Catalog id"

See original GitHub issue

It is not possible to specify the catalog id in the read_sql_query function. I would like to be able to request a specific catalog id on Athena.

boto3 takes into account the catalog id parameter in the “QueryExecutionContext” argument since at least version 1.14.58 (here).

A solution could be to add this in the athena._utils._start_query_execution here, something like:

    # database
    if database is not None:
        args["QueryExecutionContext"] = {"Database": database, "Catalog": catalog_name}

and have the catalog id parameter available from read_sql_query.

Issue Analytics

State:
Created 3 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

igorborgestcommented, Sep 21, 2020

Actually this was a pretty useful comment. We MUST update our documentation and also improve our exception message to clarify this new ctas_approach=True limitation.

Thank you @Xiangyu-C !

1reaction

igorborgestcommented, Sep 21, 2020

Wrangler has two ways to run queries on Athena and fetch the result as a DataFrame:

ctas_approach=True (Default)

Wraps the query with a CTAS and then reads the table data as parquet directly from s3.

PROS:

Faster for mid and big result sizes.
Can handle some levels of nested types.

CONS:

Requires create/delete table permissions on Glue.
Does not support timestamp with time zone
Does not support columns with repeated names.
Does not support columns with undefined data types.
A temporary table will be created and then deleted immediately.

ctas_approach=False

Does a regular query on Athena and parse the regular CSV result on s3.

PROS:

Faster for small result sizes (less latency).
Does not require create/delete table permissions on Glue
Supports timestamp with time zone.

CONS:

Slower (But stills faster than other libraries that uses the regular Athena API)
Does not handle nested types at all.

Reference

Top Results From Across the Web

Athena query on a specific "Catalog id" · Issue #392 - GitHub

I would like to be able to request a specific catalog id on Athena. boto3 takes into account the catalog id parameter in...

Querying AWS Glue Data Catalog - Amazon Athena

Use Athena to query metadata in Data Catalog. ... extract metadata information for specific databases, tables, views, partitions, and columns from Athena.

create-data-catalog — AWS CLI 2.9.7 Command Reference

Creates (registers) a data catalog with the specified name and properties. ... The GLUE type takes a catalog ID parameter and is required....

Query cross-account AWS Glue Data Catalogs using ... - Noise

In May 2021, Athena introduced the ability to query Data Catalogs across multiple AWS accounts, enabling you to access your data lake ...

Amazon Athena - Census Docs

Census database in your AWS Glue Data Catalog: This database stores all Census ... For Athena Permissions, Census needs to be able to...