question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fruits.parquet generated by test/integration.js is unreadable by Hadoop parquet-tools 1.9.0

See original GitHub issue

Build parquet-mr/parquet-tools per these instructions.

Then run its cat command to dump the fruits.parquet file that is generated:

$ java -jar target/parquet-tools-1.9.0.jar cat parquetjs/fruits.parquet 

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/davidr/workspaces/parquet-mr/parquet-tools/target/parquet-tools-1.9.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Could not read footer: java.io.IOException: Could not read footer for file DeprecatedRawLocalFileStatus{path=file:/Users/davidr/workspaces/parquetjs/fruits.parquet; isDirectory=false; length=1411554; replication=1; blocksize=33554432; modification_time=1512831680000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false}

Using parquetjs v0.8.0.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
ZJONSSONcommented, May 2, 2018

Also - if you want to avoid the headache of building and configuring parquet-tools you can simply add this to your .bashrc (or paste in console) and use docker to take care of everything.

parquet-tools() { docker run -w /home -v ${PWD}:/home nathanhowell/parquet-tools $@; }

You have to be in the same directory as the parquet file you want to inspect (since current directory will be mounted to the docker as /home). You can then use the tools directly on any parquet file, i.e.:

parquet-tools dump fruits.parquet
0reactions
ZJONSSONcommented, May 1, 2018

You might want to check out this PR here https://github.com/ironSource/parquetjs/pull/56 which has some fixes to RLE encoding and does verification of the generated files with parquet-mr.

I think you should be able to install this branch simply by:

npm install zjonsson/parquetjs#0c7948d4fa64acf76e481256422c6f4a6ba56815
Read more comments on GitHub >

github_iconTop Results From Across the Web

Using parquet tools on files in hdfs - Stack Overflow
The time of this post I can get the parquet-tools from here. If you're logged in the hadoop box: wget http://central.maven.org/maven2/org/apache ...
Read more >
Able to read parquet file with parquet-tools, but not dremio
I'm using parquetjs and verifying the output using parquet-tools version 1.9.0. I'm writing just three small rows of data just to test.
Read more >
parquet-mr/README.md at master · apache/parquet-mr - GitHub
Running without Hadoop. To run from the target directory instead of using the hadoop command, first copy the dependencies to a folder: mvn...
Read more >
building, running and debugging parquet-tools
source: https://github.com/apache/parquet-mr/tree/master/parquet-tools note that by default building ends up an error when running parquet ...
Read more >
parquet-tools - PyPI
You can show parquet file content/schema on local disk or on Amazon S3. It is incompatible with original parquet-tools. Features. Read Parquet data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found