question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Last data chunk does not get uploaded to GCP bucket

See original GitHub issue

I created a dataset following: https://clear.ml/docs/latest/docs/clearml_data/data_management_examples/data_man_simple

When I upload it to my GCP bucket by:

(yolo555) ➜  yolov5 git:(master) ✗ clearml-data close --storage gs://xxx/clearml-test --chunk-size 128 --verbose

The last prompts are:

Uploading dataset changes (98 files compressed to 94.76 MiB) to gs://icm-data-lake/clearml-test
Uploading dataset changes (98 files compressed to 94.73 MiB) to gs://icm-data-lake/clearml-test
Uploading dataset changes (97 files compressed to 94.54 MiB) to gs://icm-data-lake/clearml-test
Uploading dataset changes (98 files compressed to 94.23 MiB) to gs://icm-data-lake/clearml-test
2022-09-08 09:55:31,054 - clearml.storage - ERROR - Failed uploading: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Read timed out. (read timeout=60)
File compression and upload completed: total size 38.47 GiB, 320 chunk(s) stored (average size 123.11 MiB)
Dataset closed and finalized

Did the last chunk failed to get uploaded or is it just a false alarm?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
erezalgcommented, Sep 12, 2022

Hi @mikel-brostrom,

That…shouldn’t happen 😃 Does this persist? IE, does it happen every time? Also, it might sound silly, but did you try downloading and checking if all files are there? Should be easy to compare original and downloaded files.

In the meantime I’ll check internally if we defend somehow against partial upload issues.

0reactions
mikel-brostromcommented, Oct 7, 2022

I run the command again. No issue this time:

Uploading dataset changes (90 files compressed to 95.01 MiB) to gs://icm-data-lake/clearml-test
Uploading dataset changes (89 files compressed to 94.66 MiB) to gs://icm-data-lake/clearml-test
File compression and upload completed: total size 38.58 GiB, 320 chunk(s) stored (average size 123.46 MiB)
Dataset closed and finalized

I guess this is not an issue @erezalg anymore 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Perform resumable uploads | Cloud Storage - Google Cloud
Once you have initiated a resumable upload, there are two ways to upload the object's data: In a single chunk: This approach is...
Read more >
Upload in chunks to Google Cloud Storage error out 503 ...
I got the solution it was my mistake. As mentioned here, Google is very particular about chunk size. Chunk size restriction: All chunks...
Read more >
Creating a resumable upload from chunks · Issue #132 - GitHub
When I retrieve a chunk on the api endpoint, I tried two methods: using file.save() as the documentation says "Resumable uploads are ......
Read more >
gsutil Archives - Jayendra's Cloud Certification Blog
Streaming uploads are useful when uploading data whose final size is not known at the start of the upload, such as when generating...
Read more >
Cloud Storage Go Reference
An object holds arbitrary data as a sequence of bytes, like a file. You refer to objects using a handle, just as with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found