question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exception during artifacts retrieving

See original GitHub issue

Hello, I’m trying to code a flow where a clearML Task read from a source of data, divides it into n batches and then starts n different subtasks to process them in parallel. Each batch (a pandas DataFrame) is first uploaded as an artifact (using the method upload_artifact) so that the subtasks (created through the function create_function_task) can access them using the artifacts dictionary.

After uploading an artifact, I tried to retrieve it using task.artifacts["artifact_name"].get(). The latter raises the following exception:

Traceback (most recent call last):
  File "examples\clearml_integration.py", line 115, in <module>
    batch = parent_task.artifacts[parameters["Function/artifact_name"]].get()
  File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\clearml\binding\artifacts.py", line 160, in get
    self._object = pd.read_csv(local_file, index_col=[0])
  File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "C:\Users\lb\.clearml\venvs-builds\3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 69, in __init__
    self._reader = parsers.TextReader(self.handles.handle, **kwds)
  File "pandas\_libs\parsers.pyx", line 542, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas\_libs\parsers.pyx", line 642, in pandas._libs.parsers.TextReader._get_header
  File "pandas\_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error
  File "C:\Users\lb\AppData\Local\Programs\Python\Python38\lib\_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "C:\Users\lb\AppData\Local\Programs\Python\Python38\lib\gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "C:\Users\lb\AppData\Local\Programs\Python\Python38\lib\gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b',s')

From the UI, I noticed that the batch artifacts are stored using the extension ".csv.gz". However, after downloading them though the UI, I also noticed that they contains plain csv. No compression is made. I think that the exception is raised by pandas library when reading them because it has inferred gzip comrpession even if the file contains plain csv data.

Additional Information

Python Version: Python 3.8.10

Pip freeze output:

attrs==20.3.0
backports.entry-points-selectable==1.1.0
certifi==2021.5.30
chardet==4.0.0
clearml==1.0.4
clearml-agent==1.0.0
distlib==0.3.2
filelock==3.0.12
furl==2.1.2
future==0.18.2
humanfriendly==9.2
idna==2.10
joblib==1.0.1
jsonschema==3.2.0
mypy-extensions==0.4.3
neo4j==4.2.0
numpy==1.21.1
orderedmultidict==1.0.1
packaging==21.0
pandas==1.3.1
pandera==0.6.5
pathlib2==2.3.6
Pillow==8.3.1
platformdirs==2.1.0
psutil==5.8.0
pyhocon==0.3.58
PyJWT==2.0.1
pyparsing==2.4.7
pyreadline==2.1
pyrsistent==0.18.0
python-dateutil==2.8.2
pytz==2021.1
PyYAML==5.3.1
requests==2.25.1
scikit-learn==0.24.2
scipy==1.7.0
six==1.15.0
threadpoolctl==2.2.0
typing==3.7.4.3
typing-extensions==3.10.0.0
typing-inspect==0.7.1
urllib3==1.26.6
virtualenv==20.6.0
wrapt==1.12.1

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
luigibacommented, Aug 24, 2021

Hello everyone, I confirm you that with the new release the issue no longer exists. Thank you 👍

0reactions
jkhenningcommented, Aug 7, 2021

Hi @luigiba,

ClearML Server v1.1.1 was released with a specific fix for this issue - please check it out and let us know if it works for you 🙂

Read more comments on GitHub >

github_iconTop Results From Across the Web

Artifact distribution errors - Broadcom support portal
The only exception is if the error during artifact distribution tells you which artifact retrieval agent was used.
Read more >
Exception is being thrown for GetArtifactContentZipAsync API ...
An exception is being thrown for GetArtifactContentZipAsync API in BuildHttpClient when I try to download the artifact for a build.
Read more >
Could not resolve all artifacts for configuration ':classpath'
After I update Android Studio to 3.2.1 and gradle version in my project I am getting following build error.
Read more >
Why am I getting com.ibm.rdm.fronting.server.exception ...
I am using the REST API to retrieve the primary text from specific artifact revisions in DNG (6.0.4). To get the revision URI...
Read more >
Internal Error, Exception raised while generating runtime ...
I created one entity type and entity set and gave the properties but I am getting an Error while generating the runtime artifacts....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found