Error building extension 'cpu_adam'
See original GitHub issueHey guys, I’m having a problem getting DeepSpeed working with XLM-Roberta. I’m trying to run it on an Amazon Linux machine, which is based on Red Hat. Here are a some versions of packages/dependencies I’m using:
cuda version: 10.2
transformers: 4.4.2
pytorch: 1.7.1
deepspeed: 0.3.13
gcc/c++/g++: (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
I must admit I had some issues upgrading the CUDA version from the default 10.0 on the instance to 10.2 and GCC from 4.8.5 to 7.2.1 but since I don’t get the error that the torch and installed CUDA versions are different and that GCC has a version lower than 5, I’d assume I’m in the clear.
Here’s the essential part of the code I’m running (from a notebook):
import os
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '9994' # modify if RuntimeError: Address already in use
os.environ['RANK'] = "0"
os.environ['LOCAL_RANK'] = "0"
os.environ['WORLD_SIZE'] = "1"
from transformers import Trainer, TrainingArguments, XLMRobertaForSequenceClassification, XLMRobertaTokenizer
model = XLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-base')
training_args = TrainingArguments(
output_dir="./results",
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=64,
per_device_eval_batch_size=64,
save_steps=500,
save_total_limit=2,
deepspeed="my_ds_config.json"
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
)
trainer.train()
Here’s the content of my config file:
{
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"overlap_comm": true,
"contiguous_gradients": true,
"cpu_offload": true
},
"optimizer": {
"type": "Adam",
"params": {
"adam_w_mode": true,
"lr": 3e-5,
"betas": [ 0.9, 0.999 ],
"eps": 1e-8,
"weight_decay": 3e-7
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 3e-5,
"warmup_num_steps": 500
}
}
}
Here’s the output of my ds_config
:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
/bin/sh: line 0: type: llvm-config: not found
/bin/sh: line 0: type: llvm-config-9: not found
[WARNING] sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch']
torch version .................... 1.7.1
torch cuda version ............... 10.2
nvcc version ..................... 10.2
deepspeed install path ........... ['/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed']
deepspeed info ................... 0.3.13+22d5a1f, 22d5a1f, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 10.2
And finally, here’s the stack trace:
[2021-03-24 15:29:36,478] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.13, git-hash=unknown, git-branch=unknown
[2021-03-24 15:29:36,494] [INFO] [engine.py:77:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
No modifications detected for re-loaded extension module cpu_adam, skipping build step...
Loading extension module cpu_adam...
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-131-cc14ac05ecbb> in <module>
30 )
31
---> 32 trainer.train()
~/anaconda3/envs/python3/lib/python3.6/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
901 delay_optimizer_creation = self.sharded_ddp is not None and self.sharded_ddp != ShardedDDPOption.SIMPLE
902 if self.args.deepspeed:
--> 903 model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)
904 self.model = model.module
905 self.model_wrapped = model # will get further wrapped in DDP
~/anaconda3/envs/python3/lib/python3.6/site-packages/transformers/integrations.py in init_deepspeed(trainer, num_training_steps)
416 model=model,
417 model_parameters=model_parameters,
--> 418 config_params=config,
419 )
420
~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/__init__.py in initialize(args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config_params)
123 dist_init_required=dist_init_required,
124 collate_fn=collate_fn,
--> 125 config_params=config_params)
126 else:
127 assert mpu is None, "mpu must be None with pipeline parallelism"
~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py in __init__(self, args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config_params, dont_change_device)
181 self.lr_scheduler = None
182 if model_parameters or optimizer:
--> 183 self._configure_optimizer(optimizer, model_parameters)
184 self._configure_lr_scheduler(lr_scheduler)
185 self._report_progress(0)
~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py in _configure_optimizer(self, client_optimizer, model_parameters)
596 logger.info('Using client Optimizer as basic optimizer')
597 else:
--> 598 basic_optimizer = self._configure_basic_optimizer(model_parameters)
599 if self.global_rank == 0:
600 logger.info(
~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/runtime/engine.py in _configure_basic_optimizer(self, model_parameters)
665 optimizer = DeepSpeedCPUAdam(model_parameters,
666 **optimizer_parameters,
--> 667 adamw_mode=effective_adam_w_mode)
668 else:
669 from deepspeed.ops.adam import FusedAdam
~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/adam/cpu_adam.py in __init__(self, model_params, lr, bias_correction, betas, eps, weight_decay, amsgrad, adamw_mode)
76 DeepSpeedCPUAdam.optimizer_id = DeepSpeedCPUAdam.optimizer_id + 1
77 self.adam_w_mode = adamw_mode
---> 78 self.ds_opt_adam = CPUAdamBuilder().load()
79
80 self.ds_opt_adam.create_adam(self.opt_id,
~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py in load(self, verbose)
213 return importlib.import_module(self.absolute_name())
214 else:
--> 215 return self.jit_load(verbose)
216
217 def jit_load(self, verbose=True):
~/anaconda3/envs/python3/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py in jit_load(self, verbose)
250 extra_cuda_cflags=self.nvcc_args(),
251 extra_ldflags=self.extra_ldflags(),
--> 252 verbose=verbose)
253 build_duration = time.time() - start_build
254 if verbose:
~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
1089 if isinstance(cuda_sources, str):
1090 cuda_sources = [cuda_sources]
-> 1091
1092 cpp_sources.insert(0, '#include <torch/extension.h>')
1093
~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
1315
1316
-> 1317 def verify_ninja_availability():
1318 r'''
1319 Raises ``RuntimeError`` if `ninja <https://ninja-build.org/>`_ build system is not
~/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _import_module_from_library(module_name, path, is_python_module)
1697 sources,
1698 objects,
-> 1699 ldflags,
1700 library_target,
1701 with_cuda) -> None:
~/anaconda3/envs/python3/lib/python3.6/imp.py in find_module(name, path)
295 break # Break out of outer loop when breaking out of inner loop.
296 else:
--> 297 raise ImportError(_ERR_MSG.format(name), name=name)
298
299 encoding = None
ImportError: No module named 'cpu_adam'
Thanks in advance for your help!
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:14 (6 by maintainers)
Top GitHub Comments
Thanks @stas00 for clarifying this : )
This is no longer needed in
deepspeed
since https://github.com/microsoft/DeepSpeed/pull/825 andtransformers
master has been adjusted accordingly. You just need to have envLOCAL_RANK
to be set.Not at all. You can do your own integration and not rely on the HF Trainer.
If you do use
transformers
Trainer for a time being while this is all new you must use thetransformers
master
branch as frequent deepspeed-related updates are made.If you have build problems please make sure you read: https://huggingface.co/transformers/main_classes/trainer.html#installation-notes though looking at OP I think you have all the right components. Just check that
PATH
/LD_LIBRARY_PATH
are good.Perhaps try to pre-build deepspeed: https://github.com/microsoft/DeepSpeed/issues/885#issuecomment-808339237