question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error reading parquet files made by AWS Athena

See original GitHub issue

I made a bunch of parquet files using an amazon athena CTAS query. I downloaded these files to first test locally (the end goal is to access the data from S3).

If I run the code below;

import s3fs
from petastorm.reader import make_batch_reader
from petastorm.tf_utils import make_petastorm_dataset

dataset_url = "file:///Data/test-parquet"

with make_batch_reader(dataset_url) as reader:
    dataset = make_petastorm_dataset(reader)
    for batch in dataset:
        break
batch.correct

I receive a lot of warnings and then an error in for batch in dataset

pyarrow.lib.ArrowIOError: The file only has 1 row groups, requested metadata for row group: 1

If 1 look at dataset.take(1) or something alike, I do see the correct schema of the table. However, I don’t seem to be able to access the data.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14

github_iconTop GitHub Comments

2reactions
selitvincommented, Jan 28, 2020

Thank you. This is a serious bug. I really appreciate you looking into this. I am in the middle of 0.8.1 release and will add this fix to the release. If you don’t mind, I’ll add you to the reviewers of the fix.

0reactions
jeisingecommented, Jan 28, 2020

I’ve run a couple of tests and it appears to be working. I’m continuing to test with a full train and I’ll report back by the end of the week once it has run.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting in Athena - AWS Documentation
This error is caused by a parquet schema mismatch. A column that has a non-primitive type (for example, array ) has been declared...
Read more >
How can Athena read parquet file from S3 bucket
We can read parquet file in athena by creating a table for given s3 location. ... More details can be found in this...
Read more >
Error querying parquet files from AWS Glue/S3 - Dremio
Hi,. I'm running up against an error saying “java.io.IOException: Not a file…” when trying to query an AWS Glue table (parquet in S3)....
Read more >
Not able to read S3 Parquet file - AWS re:Post
Hi Team, I'm trying to read Parquet files in S3, but I get the following error. Please help. I'm not sure if the...
Read more >
Issue loading parquet files from S3 (Athena) - TigerGraph
Hi,. I'm trying to load nodes from a parquet file from S3, which was created by AWS Athena. ... I am specifying gzip...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found