fruits.parquet generated by test/integration.js is unreadable by Hadoop parquet-tools 1.9.0
See original GitHub issueBuild parquet-mr/parquet-tools
per these instructions.
Then run its cat
command to dump the fruits.parquet
file that is generated:
$ java -jar target/parquet-tools-1.9.0.jar cat parquetjs/fruits.parquet
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/davidr/workspaces/parquet-mr/parquet-tools/target/parquet-tools-1.9.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Could not read footer: java.io.IOException: Could not read footer for file DeprecatedRawLocalFileStatus{path=file:/Users/davidr/workspaces/parquetjs/fruits.parquet; isDirectory=false; length=1411554; replication=1; blocksize=33554432; modification_time=1512831680000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false}
Using parquetjs v0.8.0
.
Issue Analytics
- State:
- Created 6 years ago
- Comments:5
Top Results From Across the Web
Using parquet tools on files in hdfs - Stack Overflow
The time of this post I can get the parquet-tools from here. If you're logged in the hadoop box: wget http://central.maven.org/maven2/org/apache ...
Read more >Able to read parquet file with parquet-tools, but not dremio
I'm using parquetjs and verifying the output using parquet-tools version 1.9.0. I'm writing just three small rows of data just to test.
Read more >parquet-mr/README.md at master · apache/parquet-mr - GitHub
Running without Hadoop. To run from the target directory instead of using the hadoop command, first copy the dependencies to a folder: mvn...
Read more >building, running and debugging parquet-tools
source: https://github.com/apache/parquet-mr/tree/master/parquet-tools note that by default building ends up an error when running parquet ...
Read more >parquet-tools - PyPI
You can show parquet file content/schema on local disk or on Amazon S3. It is incompatible with original parquet-tools. Features. Read Parquet data...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Also - if you want to avoid the headache of building and configuring parquet-tools you can simply add this to your
.bashrc
(or paste in console) and use docker to take care of everything.You have to be in the same directory as the parquet file you want to inspect (since current directory will be mounted to the docker as
/home
). You can then use the tools directly on any parquet file, i.e.:You might want to check out this PR here https://github.com/ironSource/parquetjs/pull/56 which has some fixes to RLE encoding and does verification of the generated files with parquet-mr.
I think you should be able to install this branch simply by: