lack of memory clear causes some GPU tests to fail in test_models_unet.py
See original GitHub issueDescribe the bug
running pytest if I suppress test_attention_block_default, two additional unet tests pass
test_layers_utils.py .................
test_models_unet.py ...............FF..............F.........s....s..
vs
test_layers_utils.py .............s...
test_models_unet.py ...............................F.........s....s..
can test by either suppressing the test, or running unet tests with and without
test_attention_block_default
pytest test_layers_utils.py::AttentionBlockTests::test_attention_block_default test_models_unet.py
pytest test_models_unet.py
The tests it causes to fail are,
FAILED test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate - IndexError: list index out of range
FAILED test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results - IndexError: list index out of range
Reproduction
run the following in the test directory
pytest test_layers_utils.py::AttentionBlockTests::test_attention_block_default test_models_unet.py
pytest test_models_unet.py
Logs
pytest test_layers_utils.py::AttentionBlockTests::test_attention_block_default test_models_unet.py
============================================================================================== test session starts ===============================================================================================
platform win32 -- Python 3.8.13, pytest-7.1.3, pluggy-1.0.0
rootdir: C:\Users\currentuser\diffusers
plugins: anyio-3.6.1, hydra-core-1.2.0, cov-4.0.0, mock-3.8.2
collected 50 items
test_layers_utils.py . [ 2%]
test_models_unet.py ...............FF..............F.........s....s.. [100%]
==================================================================================================== FAILURES ====================================================================================================
_______________________________________________________________________________ UNetLDMModelTests.test_from_pretrained_accelerate ________________________________________________________________________________
self = <tests.test_models_unet.UNetLDMModelTests testMethod=test_from_pretrained_accelerate>
@unittest.skipIf(torch_device == "cpu", "This test is supposed to run on GPU")
def test_from_pretrained_accelerate(self):
> model, _ = UNet2DModel.from_pretrained(
"fusing/unet-ldm-dummy-update", output_loading_info=True, device_map="auto"
)
test_models_unet.py:140:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\src\diffusers\modeling_utils.py:399: in from_pretrained
accelerate.load_checkpoint_and_dispatch(model, model_file, device_map)
..\..\mambaforge\envs\ldm\lib\site-packages\accelerate\big_modeling.py:367: in load_checkpoint_and_dispatch
return dispatch_model(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
model = UNet2DModel(
(conv_in): Conv2d(4, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(time_proj): Timesteps()
...-05, affine=True)
(conv_act): SiLU()
(conv_out): Conv2d(32, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
device_map = {'': 'cpu'}, main_device = None, state_dict = None, offload_dir = None, offload_buffers = False, preload_module_classes = None
def dispatch_model(
model: nn.Module,
device_map: Dict[str, Union[str, int, torch.device]],
main_device: Optional[torch.device] = None,
state_dict: Optional[Dict[str, torch.Tensor]] = None,
offload_dir: Union[str, os.PathLike] = None,
offload_buffers: bool = False,
preload_module_classes: Optional[List[str]] = None,
):
"""
Dispatches a model according to a given device map. Layers of the model might be spread across GPUs, offloaded on
the CPU or even the disk.
Args:
model (`torch.nn.Module`):
The model to dispatch.
device_map (`Dict[str, Union[str, int, torch.device]]`):
A dictionary mapping module names in the models `state_dict` to the device they should go to. Note that
`"disk"` is accepted even if it's not a proper value for `torch.device`.
main_device (`str`, `int` or `torch.device`, *optional*):
The main execution device. Will default to the first device in the `device_map` different from `"cpu"` or
`"disk"`.
state_dict (`Dict[str, torch.Tensor]`, *optional*):
The state dict of the part of the model that will be kept on CPU.
offload_dir (`str` or `os.PathLike`):
The folder in which to offload the model weights (or where the model weights are already offloaded).
offload_buffers (`bool`, *optional*, defaults to `False`):
Whether or not to offload the buffers with the model parameters.
preload_module_classes (`List[str]`, *optional*):
A list of classes whose instances should load all their weights (even in the submodules) at the beginning
of the forward. This should only be used for classes that have submodules which are registered but not
called directly during the forward, for instance if a `dense` linear layer is registered, but at forward,
`dense.weight` and `dense.bias` are used in some operations instead of calling `dense` directly.
"""
if not is_torch_version(">=", "1.9.0"):
raise NotImplementedError("Model dispatching requires torch >= 1.9.0")
# Error early if the device map is incomplete.
check_device_map(model, device_map)
if main_device is None:
> main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]
E IndexError: list index out of range
..\..\mambaforge\envs\ldm\lib\site-packages\accelerate\big_modeling.py:244: IndexError
_____________________________________________________________________ UNetLDMModelTests.test_from_pretrained_accelerate_wont_change_results ______________________________________________________________________
self = <tests.test_models_unet.UNetLDMModelTests testMethod=test_from_pretrained_accelerate_wont_change_results>
@unittest.skipIf(torch_device == "cpu", "This test is supposed to run on GPU")
def test_from_pretrained_accelerate_wont_change_results(self):
> model_accelerate, _ = UNet2DModel.from_pretrained(
"fusing/unet-ldm-dummy-update", output_loading_info=True, device_map="auto"
)
test_models_unet.py:152:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\src\diffusers\modeling_utils.py:399: in from_pretrained
accelerate.load_checkpoint_and_dispatch(model, model_file, device_map)
..\..\mambaforge\envs\ldm\lib\site-packages\accelerate\big_modeling.py:367: in load_checkpoint_and_dispatch
return dispatch_model(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
model = UNet2DModel(
(conv_in): Conv2d(4, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(time_proj): Timesteps()
...-05, affine=True)
(conv_act): SiLU()
(conv_out): Conv2d(32, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
device_map = {'': 'cpu'}, main_device = None, state_dict = None, offload_dir = None, offload_buffers = False, preload_module_classes = None
def dispatch_model(
model: nn.Module,
device_map: Dict[str, Union[str, int, torch.device]],
main_device: Optional[torch.device] = None,
state_dict: Optional[Dict[str, torch.Tensor]] = None,
offload_dir: Union[str, os.PathLike] = None,
offload_buffers: bool = False,
preload_module_classes: Optional[List[str]] = None,
):
"""
Dispatches a model according to a given device map. Layers of the model might be spread across GPUs, offloaded on
the CPU or even the disk.
Args:
model (`torch.nn.Module`):
The model to dispatch.
device_map (`Dict[str, Union[str, int, torch.device]]`):
A dictionary mapping module names in the models `state_dict` to the device they should go to. Note that
`"disk"` is accepted even if it's not a proper value for `torch.device`.
main_device (`str`, `int` or `torch.device`, *optional*):
The main execution device. Will default to the first device in the `device_map` different from `"cpu"` or
`"disk"`.
state_dict (`Dict[str, torch.Tensor]`, *optional*):
The state dict of the part of the model that will be kept on CPU.
offload_dir (`str` or `os.PathLike`):
The folder in which to offload the model weights (or where the model weights are already offloaded).
offload_buffers (`bool`, *optional*, defaults to `False`):
Whether or not to offload the buffers with the model parameters.
preload_module_classes (`List[str]`, *optional*):
A list of classes whose instances should load all their weights (even in the submodules) at the beginning
of the forward. This should only be used for classes that have submodules which are registered but not
called directly during the forward, for instance if a `dense` linear layer is registered, but at forward,
`dense.weight` and `dense.bias` are used in some operations instead of calling `dense` directly.
"""
if not is_torch_version(">=", "1.9.0"):
raise NotImplementedError("Model dispatching requires torch >= 1.9.0")
# Error early if the device map is incomplete.
check_device_map(model, device_map)
if main_device is None:
> main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]
E IndexError: list index out of range
..\..\mambaforge\envs\ldm\lib\site-packages\accelerate\big_modeling.py:244: IndexError
_____________________________________________________________________________ UNet2DConditionModelTests.test_gradient_checkpointing ______________________________________________________________________________
self = <tests.test_models_unet.UNet2DConditionModelTests testMethod=test_gradient_checkpointing>
def test_gradient_checkpointing(self):
# enable deterministic behavior for gradient checkpointing
init_dict, inputs_dict = self.prepare_init_args_and_inputs_for_common()
model = self.model_class(**init_dict)
model.to(torch_device)
out = model(**inputs_dict).sample
# run the backwards pass on the model. For backwards pass, for simplicity purpose,
# we won't calculate the loss and rather backprop on out.sum()
model.zero_grad()
out.sum().backward()
# now we save the output and parameter gradients that we will use for comparison purposes with
# the non-checkpointed run.
output_not_checkpointed = out.data.clone()
grad_not_checkpointed = {}
for name, param in model.named_parameters():
grad_not_checkpointed[name] = param.grad.data.clone()
model.enable_gradient_checkpointing()
out = model(**inputs_dict).sample
# run the backwards pass on the model. For backwards pass, for simplicity purpose,
# we won't calculate the loss and rather backprop on out.sum()
model.zero_grad()
out.sum().backward()
# now we save the output and parameter gradients that we will use for comparison purposes with
# the non-checkpointed run.
output_checkpointed = out.data.clone()
grad_checkpointed = {}
for name, param in model.named_parameters():
grad_checkpointed[name] = param.grad.data.clone()
# compare the output and parameters gradients
self.assertTrue((output_checkpointed == output_not_checkpointed).all())
for name in grad_checkpointed:
> self.assertTrue(torch.allclose(grad_checkpointed[name], grad_not_checkpointed[name], atol=5e-5))
E AssertionError: False is not true
test_models_unet.py:308: AssertionError
================================================================================================ warnings summary ================================================================================================
..\..\mambaforge\envs\ldm\lib\site-packages\torch\utils\tensorboard\__init__.py:4
C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\torch\utils\tensorboard\__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if not hasattr(tensorboard, "__version__") or LooseVersion(
..\..\mambaforge\envs\ldm\lib\site-packages\torch\utils\tensorboard\__init__.py:6
C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\torch\utils\tensorboard\__init__.py:6: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
) < LooseVersion("1.15"):
..\..\mambaforge\envs\ldm\lib\site-packages\transformers\image_utils.py:239
C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\transformers\image_utils.py:239: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
def resize(self, image, size, resample=PIL.Image.BILINEAR, default_to_square=True, max_size=None):
..\..\mambaforge\envs\ldm\lib\site-packages\transformers\image_utils.py:396
C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\transformers\image_utils.py:396: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
def rotate(self, image, angle, resample=PIL.Image.NEAREST, expand=0, center=None, translate=None, fillcolor=None):
..\..\mambaforge\envs\ldm\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:67
C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:67: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
resample=Image.BICUBIC,
tests/test_models_unet.py::NCSNppModelTests::test_determinism
tests/test_models_unet.py::NCSNppModelTests::test_ema_training
tests/test_models_unet.py::NCSNppModelTests::test_from_pretrained_save_pretrained
tests/test_models_unet.py::NCSNppModelTests::test_model_from_config
tests/test_models_unet.py::NCSNppModelTests::test_output
tests/test_models_unet.py::NCSNppModelTests::test_output_pretrained_ve_large
tests/test_models_unet.py::NCSNppModelTests::test_outputs_equivalence
tests/test_models_unet.py::NCSNppModelTests::test_training
C:\Users\currentuser\diffusers\src\diffusers\models\resnet.py:259: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
torch.tensor(kernel, device=hidden_states.device),
tests/test_models_unet.py::NCSNppModelTests::test_determinism
tests/test_models_unet.py::NCSNppModelTests::test_ema_training
tests/test_models_unet.py::NCSNppModelTests::test_from_pretrained_save_pretrained
tests/test_models_unet.py::NCSNppModelTests::test_model_from_config
tests/test_models_unet.py::NCSNppModelTests::test_output
tests/test_models_unet.py::NCSNppModelTests::test_output_pretrained_ve_large
tests/test_models_unet.py::NCSNppModelTests::test_outputs_equivalence
tests/test_models_unet.py::NCSNppModelTests::test_training
C:\Users\currentuser\diffusers\src\diffusers\models\resnet.py:188: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
torch.tensor(kernel, device=hidden_states.device),
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================ short test summary info =============================================================================================
FAILED test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate - IndexError: list index out of range
FAILED test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results - IndexError: list index out of range
FAILED test_models_unet.py::UNet2DConditionModelTests::test_gradient_checkpointing - AssertionError: False is not true
============================================================================= 3 failed, 45 passed, 2 skipped, 21 warnings in 15.00s ==============================================================================
System Info
diffusers
version: 0.5.0.dev0 (current head)- Platform: Windows-10-10.0.22000-SP0
- Python version: 3.8.13
- PyTorch version (GPU?): 1.12.1+cu116 (True)
- Huggingface_hub version: 0.10.0
- Transformers version: 4.22.2
- Using GPU in script?: 3060 6GB mobile
- Using distributed or parallel set-up in script?: No (is using accelerate)
Issue Analytics
- State:
- Created a year ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
How can I clear GPU memory in tensorflow 2? #36465 - GitHub
When I run nvidia-smi I can see the memory is still used, but there is no process using a GPU. Also, If I...
Read more >How can I solve 'ran out of gpu memory' in TensorFlow
I was encountering out of memory errors when training a small CNN on a GTX 970. Through somewhat of a fluke, I discovered...
Read more >4 Problems that Cause a Faulty GPU and How to Fix them.
Sponsor: Check out ASRock's TRX40 Taichi Motherboard to get the best support for Zen 2 Threadripper. Purchase: https://amzn.to/2J5sSDOMore ...
Read more >Your GPU memory is full? Try these fixes to resolve it! - YouTube
Your GPU memory is full? Try these fixes to resolve it!This video will show you how to do it!Try the following solutions to...
Read more >How To TRY and FIX a Graphics Card (COMPLETE Start to ...
Recently been given two GPUs (a GTX 970 and a HD7770) and I thought with the GPU crisis going on, I would take...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Interesting, we have the same issue when testing on MacOS: https://github.com/huggingface/diffusers/pull/796 Will follow up with an
empty_cache()
fix once we merge those tests, to see if it helps there too. Thanks for investigating @Thomas-MMJ!Yes, the issue doesn’t come up anymore in our tests with the 1.13 RC pytorch release (
torch.cuda.empty_cache()
shouldn’t affect themps
device)