[BUG] 'NoneType' object has no attribute 'reserve_partitioned_swap_space' with params offloading to nvme enabled
See original GitHub issueDescribe the bug A clear and concise description of what the bug is. Exception ‘NoneType’ object has no attribute ‘reserve_partitioned_swap_space’ with params offloading to nvme enabled
To Reproduce Steps to reproduce the behavior: With this configuration:
{
"train_batch_size": 15,
"fp16": {
"enabled": true,
"min_loss_scale": 1,
"opt_level": "O3"
},
"zero_optimization": {
"stage": 3,
"offload_param": {
"device": "nvme",
"nvme_path": "/home/deepschneider/deepspeed",
"buffer_count": 5,
"buffer_size": 1e8,
"max_in_cpu": 1e9
},
"offload_optimizer": {
"device": "nvme",
"nvme_path": "/home/deepschneider/deepspeed",
"buffer_count": 4,
"pipeline_read": false,
"pipeline_write": false,
"pin_memory": true
},
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"contiguous_gradients": true,
"overlap_comm": true,
"aio": {
"block_size": 1048576,
"queue_depth": 8,
"thread_count": 1,
"single_submit": false,
"overlap_events": true
}
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": 5e-05,
"betas": [
0.9,
0.999
],
"eps": 1e-08
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 5e-05,
"warmup_num_steps": 100
}
}
}
I’m getting the following exception
This configuration works fine:
{
"train_batch_size": 15,
"fp16": {
"enabled": true,
"min_loss_scale": 1,
"opt_level": "O3"
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "nvme",
"nvme_path": "/home/deepschneider/deepspeed",
"buffer_count": 4,
"pipeline_read": false,
"pipeline_write": false,
"pin_memory": true
},
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"contiguous_gradients": true,
"overlap_comm": true,
"aio": {
"block_size": 1048576,
"queue_depth": 8,
"thread_count": 1,
"single_submit": false,
"overlap_events": true
}
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": 5e-05,
"betas": [
0.9,
0.999
],
"eps": 1e-08
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 5e-05,
"warmup_num_steps": 100
}
}
}
Expected behavior Both configurations should work fine.
ds_report output
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [YES] ...... [OKAY]
cpu_adagrad ............ [YES] ...... [OKAY]
fused_adam ............. [YES] ...... [OKAY]
fused_lamb ............. [YES] ...... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [YES] ...... [OKAY]
stochastic_transformer . [YES] ...... [OKAY]
async_io ............... [YES] ...... [OKAY]
transformer_inference .. [YES] ...... [OKAY]
utils .................. [YES] ...... [OKAY]
quantizer .............. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/torch']
torch version .................... 1.10.0+cu113
torch cuda version ............... 11.3
nvcc version ..................... 11.3
deepspeed install path ........... ['/home/deepschneider/PycharmProjects/gpt-neo-fine-tuning-example/venv/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.5.6+43e6432, 43e6432, cpu-adam/fix-scalar-compile
deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3
System info (please complete the following information):
- OS: Ubuntu 20.04
- GPU: 1xA6000
- Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]: no
- Python version: 3.8.10
Launcher context
Are you launching your experiment with the deepspeed
launcher, MPI, or something else?
Huggingface Trainer
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (6 by maintainers)
Top Results From Across the Web
AttributeError: 'NoneType' object has no attribute 'push'
Argument : size -- an integer or None If size is an integer an empty undobuffer of ... I only adding answer for...
Read more >How To Fix Attribute Error: 'NoneType' Object Has ... - YouTube
Article Link: https://blog.finxter.com/how-to-fix- error - nonetype - object - has - no - attribute -group/ Email Academy: ...
Read more >MultiParm with groups Problem | Forums - SideFX
... AttributeError: 'NoneType' object has no attribute 'eval'. Error. Although I connected the enable and group type parameters to the ...
Read more >Python-SDK doesn't seem to handle quota information on disks
Cause: A quota does not have to be defined for every storage domain. ... is None: 710 raise Error( AttributeError: 'NoneType' object has...
Read more >'NoneType' object has no attribute 'subset_range' in Ab-initio ...
Hello, I am trying to rerun an ab initio job with less particles (last time it worked a few months ago), but now...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@dredwardhyde, thanks for confirming your experience. Actually, this 6B model should easily run using Huggingface Trainer with the parameter and optimizer offload configurations. I think a few things could help:
We can help with getting this finetuning working, so can you please open a new issue for that purpose. Thanks.
@dredwardhyde, can you please test the PR?