Training memory optimizations not working on AMD hardware
See original GitHub issueDescribe the bug
Dreambooth training example has a section about training on 16GB GPU. As Radeon Navi 21 series models all have 16GB available this in theory would increase the amount of hardware that can train models by a really large margin.
Problem is that at least out of the box neither of the optimizations --gradient_checkpointing
nor --use_8bit_adam
seem to support AMD cards.
Reproduction
Using the example command command with pytorch rocm 5.1.1 (pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.1.1
)
--gradient_checkpointing
: returns error 'UNet2DConditionModel' object has no attribute 'enable_gradient_checkpointing'
--use_8bit_adam
: throws handful of CUDA errors, see Logs section below for the main part (is bitsandbytes
Nvidia specific and if it is is there an AMD implementation available?)
Logs
using --gradient_checkpointing
:
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
Traceback (most recent call last):
File "/home/foobar/diffusers/examples/dreambooth/train_dreambooth.py", line 606, in <module>
main()
File "/home/foobar/diffusers/examples/dreambooth/train_dreambooth.py", line 408, in main
unet.enable_gradient_checkpointing()
File "/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1207, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'UNet2DConditionModel' object has no attribute 'enable_gradient_checkpointing'
Traceback (most recent call last):
File "/home/foobar/diffusers/.venv/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
--use_8bit_adam
:
...
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
warn(
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary /home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/bitsandbytes/cextension.py:48: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
...
### System Info
- `diffusers` version: 0.3.0
- Platform: Linux-5.15.67-x86_64-with-glibc2.34
- Python version: 3.9.13
- PyTorch version (GPU?): 1.12.1+rocm5.1.1 (True)
- Huggingface_hub version: 0.9.1
- Transformers version: 4.22.2
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:17 (4 by maintainers)
Top GitHub Comments
We will release a new
diffusers
version very soon!The
use_8bit_adam
problems potentially are, as bitsandbytes includes a C extension which wraps some CUDA functions, i.e, doesn’t run through pytorch-rocm. Not really anything that can be fixed on this end though