question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Change in way read_glue_table addresses tables

See original GitHub issue

Describe the bug Previously, I have used the following code without issue:

item_ids = wr.s3.read_parquet(path=f'{processed_data_bucket}/distinct_path')
keys = list(wr.s3.read_parquet_table(database=gluedatabase, table='path_tokens').itemid.unique())

Now when I try to run the same code, I can see there has been some changes in the way that awswrangler refers to parquet file locations, as I now get:

item_ids = wr.s3.read_parquet(path=f'{processed_data_bucket}/distinct_path')
...
InvalidArgumentValue: '<redacted s3 path that does not start with s3://>/distinct_path' is not a valid path. It MUST start with 's3://'

which can obviously be fixed by changing the specified path, however I am not sure how to resolve the read_parquet_table issue:

keys = list(wr.s3.read_parquet_table(database=gluedatabase, table='path_tokens').itemid.unique())
...
InvalidArgumentValue: '<redacted s3 path that does not start with s3://>/path_tokens' is not a valid path. It MUST start with 's3://'

digging into the call to read_parquet_table, I see that the location is resolved via

res = client_glue.get_table(DatabaseName=gluedatabase, Name="path_tokens")
res['Table']['StorageDescriptor']['Location']

>> <same path from above that does not include s3://>

Because of this, I figured it must have been a configuration issue in the data catalogue, but I checked and all the tables have the full location path specified including s3, as does the database, so I cannot see how to work around this issue

wr.version = 1.9.0 boto3.version = 1.14.53

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
gkennoscommented, Sep 2, 2020

Screen Shot 2020-09-02 at 11 26 24 pm

Looks correct to me?

0reactions
igorborgestcommented, Sep 2, 2020

A last tip about:

table_list = wr.s3.list_objects(f's3://{processed_data_bucket}')
table_list_for_updating = list(set([t.split(processed_data_bucket)[1].split('/')[1] for t in table_list]))

It can be done using wr.s3.list_directories() - It is faster cause will not fetch all filenames to the client side.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Working with tables on the AWS Glueconsole - AWS Glue
To change the schema of a table, choose Edit schema to add and remove columns, change column names, and change data types. To...
Read more >
MAC Address Tables | Basic Data Transmission in Networks
The first thing the switch would do when receiving the traffic is create a new entry in its MAC address table for PC1's...
Read more >
Static MAC Address Table Entry - NetworkLessons.com
This lesson explains how to configure static MAC address entries in your Cisco Catalyst IOS Switch MAC address table.
Read more >
Switching Tables - Router Alley
To perform this forwarding decision, a switch consults its hardware address table. For Ethernet switches, this is referred to as the MAC address...
Read more >
The MAC Address Table (7.3) > Ethernet Switching | Cisco Press
Layer 3 switches are beyond the scope of this book. Switch Fundamentals (7.3.1). Now that you know all about Ethernet MAC addresses, it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found