Exception during artifacts retrieving
See original GitHub issueHello, I’m trying to code a flow where a clearML Task read from a source of data, divides it into n batches and then starts n different subtasks to process them in parallel. Each batch (a pandas DataFrame) is first uploaded as an artifact (using the method upload_artifact
) so that the subtasks (created through the function create_function_task
) can access them using the artifacts dictionary.
After uploading an artifact, I tried to retrieve it using task.artifacts["artifact_name"].get()
. The latter raises the following exception:
Traceback (most recent call last):
File "examples\clearml_integration.py", line 115, in <module>
batch = parent_task.artifacts[parameters["Function/artifact_name"]].get()
File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\clearml\binding\artifacts.py", line 160, in get
self._object = pd.read_csv(local_file, index_col=[0])
File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\readers.py", line 811, in __init__
self._engine = self._make_engine(self.engine)
File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 69, in __init__
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File "pandas\_libs\parsers.pyx", line 542, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 642, in pandas._libs.parsers.TextReader._get_header
File "pandas\_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error
File "C:\Users\lb\AppData\Local\Programs\Python\Python38\lib\_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "C:\Users\lb\AppData\Local\Programs\Python\Python38\lib\gzip.py", line 479, in read
if not self._read_gzip_header():
File "C:\Users\lb\AppData\Local\Programs\Python\Python38\lib\gzip.py", line 427, in _read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b',s')
From the UI, I noticed that the batch artifacts are stored using the extension ".csv.gz"
. However, after downloading them though the UI, I also noticed that they contains plain csv. No compression is made. I think that the exception is raised by pandas library when reading them because it has inferred gzip comrpession even if the file contains plain csv data.
Additional Information
Python Version:
Python 3.8.10
Pip freeze output:
attrs==20.3.0
backports.entry-points-selectable==1.1.0
certifi==2021.5.30
chardet==4.0.0
clearml==1.0.4
clearml-agent==1.0.0
distlib==0.3.2
filelock==3.0.12
furl==2.1.2
future==0.18.2
humanfriendly==9.2
idna==2.10
joblib==1.0.1
jsonschema==3.2.0
mypy-extensions==0.4.3
neo4j==4.2.0
numpy==1.21.1
orderedmultidict==1.0.1
packaging==21.0
pandas==1.3.1
pandera==0.6.5
pathlib2==2.3.6
Pillow==8.3.1
platformdirs==2.1.0
psutil==5.8.0
pyhocon==0.3.58
PyJWT==2.0.1
pyparsing==2.4.7
pyreadline==2.1
pyrsistent==0.18.0
python-dateutil==2.8.2
pytz==2021.1
PyYAML==5.3.1
requests==2.25.1
scikit-learn==0.24.2
scipy==1.7.0
six==1.15.0
threadpoolctl==2.2.0
typing==3.7.4.3
typing-extensions==3.10.0.0
typing-inspect==0.7.1
urllib3==1.26.6
virtualenv==20.6.0
wrapt==1.12.1
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Hello everyone, I confirm you that with the new release the issue no longer exists. Thank you 👍
Hi @luigiba,
ClearML Server v1.1.1 was released with a specific fix for this issue - please check it out and let us know if it works for you 🙂