question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ParquetFile error not finding _metadata directory

See original GitHub issue

system: centos 7.3, fastparquet 0.1.2, anaconda python 3.6.2

I created some parquet files using

fastparquet.write(filename,data,compression='SNAPPY')

and now when I try to read those same files with

pf=ParquetFile(filename)

I get the error:

NotADirectoryError: [Errno 20] Not a directory: 'filename/_metadata'

thoughts?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:16 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
Wercurialcommented, Jan 7, 2020

Hello, I have the same problem, but I didn’t understand your solution above. Can you elaborate? Thank you,my fastparquet version is 0.1.5

1reaction
virtuallukecommented, Oct 16, 2017
# ls -al
total 16961
drwxr-xr-x 2 root root     4096 Oct 16 23:34 .
drwxr-xr-x 3 root root     4096 Oct 16 23:34 ..
-rw-r--r-- 1 root root 17359516 Oct 16 23:34 file1.parq
# python
Python 3.6.2 |Anaconda, Inc.| (default, Sep 30 2017, 18:42:57)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fastparquet
>>> fastparquet.ParquetFile('file1.parq')
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.6/site-packages/fastparquet/api.py", line 96, in __init__
    with open_with(fn2, 'rb') as f:
  File "/opt/anaconda3/lib/python3.6/site-packages/fastparquet/util.py", line 44, in default_open
    return open(f, mode)
NotADirectoryError: [Errno 20] Not a directory: 'file1.parq/_metadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.6/site-packages/fastparquet/api.py", line 102, in __init__
    self._parse_header(f, verify)
  File "/opt/anaconda3/lib/python3.6/site-packages/fastparquet/api.py", line 125, in _parse_header
    self._set_attrs()
  File "/opt/anaconda3/lib/python3.6/site-packages/fastparquet/api.py", line 139, in _set_attrs
    self.schema = schema.SchemaHelper(self._schema)
  File "/opt/anaconda3/lib/python3.6/site-packages/fastparquet/schema.py", line 80, in __init__
    self.text = schema_to_text(self.schema_elements[0])
  File "/opt/anaconda3/lib/python3.6/site-packages/fastparquet/schema.py", line 47, in schema_to_text
    text += '\n' + schema_to_text(child, indent)
  File "/opt/anaconda3/lib/python3.6/site-packages/fastparquet/schema.py", line 36, in schema_to_text
    root.converted_type])
KeyError: 24
Read more comments on GitHub >

github_iconTop Results From Across the Web

Folder not taken into account in a parquet dataset - Dremio
However, I'm currently suspecting that some parquet files having no rows are causing the issue. I get an error in the logs. Caused...
Read more >
How to write Parquet metadata with pyarrow? - Stack Overflow
Pyarrow maps the file-wide metadata to a field in the table's schema named metadata. Regrettably there is not (yet) documentation on this.
Read more >
dask.dataframe.read_parquet - Dask documentation
This reads a directory of Parquet data into a Dask.dataframe, ... By default will be inferred from the pandas parquet file metadata, if...
Read more >
pyarrow.parquet.ParquetFile — Apache Arrow v10.0.1
Will be used in reads for pandas schema metadata if not found in the main file's metadata, no other uses at the moment....
Read more >
Read Common Crawl Parquet Metadata with Python - Skeptric
In this article we download the Parquet metadata for all 25,000 WARC ... does not look like a Parquet file; magic %s' raise...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found