question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot create an Artifact with 7 files of size 975244241 bytes

See original GitHub issue

I have the following script:

import wandb

if __name__ == "__main__":
  with wandb.init(
      project="private",
      entity="private",
      tags=["DELETEME"],
  ) as wandb_run:
    artifact = wandb.Artifact("deleteme", type="deleteme")

    for epoch in range(100):
      print(f"Epoch {epoch}")
      with artifact.new_file(f"checkpoint{epoch}", mode="wb") as f:
        contents = open("/dev/urandom", "rb").read(975244241)
        print(len(contents))
        f.write(contents)

Running this script I see the following error:

[nix-shell:/efs/research/lottery]$ python deleteme.py 
wandb: Currently logged in as: skainswo. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.12.18 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.12.16
wandb: Run data is saved locally in /tmp/wandb/run-20220621_073539-11aknk72
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run silvery-microwave-547
wandb: ⭐️ View project at https://wandb.ai/skainswo/playing-the-lottery
wandb: 🚀 View run at https://wandb.ai/skainswo/playing-the-lottery/runs/11aknk72
Epoch 0
975244241
Epoch 1
975244241
Epoch 2
975244241
Epoch 3
975244241
Epoch 4
975244241
Epoch 5
975244241
Epoch 6
975244241
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb:                                                                                
wandb: Synced silvery-microwave-547: https://wandb.ai/skainswo/playing-the-lottery/runs/11aknk72
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: /tmp/wandb/run-20220621_073539-11aknk72/logs
Traceback (most recent call last):
  File "/efs/research/lottery/deleteme.py", line 16, in <module>
    f.write(contents)
OSError: [Errno 28] No space left on device

[nix-shell:/efs/research/lottery]$ df -H
Filesystem                Size  Used Avail Use% Mounted on
devtmpfs                  3.3G     0  3.3G   0% /dev
tmpfs                      33G     0   33G   0% /dev/shm
tmpfs                      17G  7.6M   17G   1% /run
tmpfs                      33G  345k   33G   1% /run/wrappers
/dev/disk/by-label/nixos  136G  118G   11G  92% /
<censored IP>:/             9.3E  2.7G  9.3E   1% /efs
tmpfs                     6.5G  263k  6.5G   1% /run/user/1000

[nix-shell:/efs/research/lottery]$ 

which is incorrect, considering I have more than enough free space left on disk to fit the files. I’ve also replicated this on machine with terabytes of disk space left, and still see the same error.

For the sake of exact reproducibility, I’m running with the following shell.nix:

# Run with nixGL, eg `nixGLNvidia-510.47.03 python cifar10_convnet_run.py --test`

# To prevent JAX from allocating all GPU memory: XLA_PYTHON_CLIENT_PREALLOCATE=false
# To push build to cachix: nix-store -qR --include-outputs $(nix-instantiate shell.nix) | cachix push ploop

let
  # pkgs = import (/home/skainswo/dev/nixpkgs) { };

  # Last updated: 2022-05-16. Check for new commits at status.nixos.org.
  pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/556ce9a40abde33738e6c9eac65f965a8be3b623.tar.gz") {
    config.allowUnfree = true;
    # These actually cause problems for some reason. bug report?
    # config.cudaSupport = true;
    # config.cudnnSupport = true;
  };
in
pkgs.mkShell {
  buildInputs = with pkgs; [
    ffmpeg
    python3
    python3Packages.augmax
    python3Packages.einops
    python3Packages.flax
    python3Packages.ipython
    python3Packages.jax
    # See https://discourse.nixos.org/t/petition-to-build-and-cache-unfree-packages-on-cache-nixos-org/17440/14
    # as to why we don't use the source builds of jaxlib/tensorflow.
    (python3Packages.jaxlib-bin.override {
      cudaSupport = true;
    })
    python3Packages.matplotlib
    # python3Packages.pandas
    python3Packages.plotly
    # python3Packages.scikit-learn
    (python3Packages.tensorflow-bin.override {
      cudaSupport = false;
    })
    # Thankfully tensorflow-datasets does not have tensorflow as a propagatedBuildInput. If that were the case for any
    # of these dependencies, we'd be in trouble since Python does not like multiple versions of the same package in
    # PYTHONPATH.
    python3Packages.tensorflow-datasets
    python3Packages.tqdm
    python3Packages.wandb
    yapf
  ];

  # Don't clog EFS with wandb results. Wandb will create and use /tmp/wandb.
  WANDB_DIR = "/tmp";

  # Don't check this into version control!
  WANDB_API_KEY = "... secret ...";
}

In short: I’m running on NixOS 22.05, wandb client version 0.12.16, Python 3.9.12 on a p3.2xlarge EC2 instance. I’ve also replicated the issue on Ubuntu, and with and without a custom WANDB_DIR setting.

Why can’t I save 7 files in an Artifact?

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:19 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
vanpeltcommented, Jul 13, 2022

I take that back, it writes to tempfile.TemporaryDirectory() from the python stdlib. You can set _artifact_dir on an instance of artifact to an empty directory on a partition that has sufficient space…

0reactions
luisberguacommented, Nov 7, 2022

Hi @dorbittonn, thanks for writing in! I’ve followed up with you answering the email that you sent to support. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cannot create an Artifact with 7 files of size 975244241 bytes
In short: I'm running on NixOS 22.05, wandb client version 0.12.16, Python 3.9.12 on a p3.2xlarge EC2 instance. I've also replicated the issue ......
Read more >
Build artifacts are failing to upload when artifact container is "."
Thing.7z Begin chunking upload file './Thing.7z', chunk size '4194304 Bytes', total chunks '2'. Attempt '1' for uploading chunk '1' of file ...
Read more >
Maximum build artifact file size (global build setting ... - YouTrack
"Failed to publish artifacts: Artifact file 'BigTestFile3' has size 2147483648 bytes which exceeds maximum allowed size of 1073741824 bytes. Maximum artifact ...
Read more >
Archaeology | National Geographic Society
These remains can be any objects that people created, modified, or used. ... Archaeologists excavate and study features and artifacts, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found