[BUG] RuntimeError: Ninja is required to load C++ extensions
See original GitHub issueHi,
I am getting the following error when running pretrain_gpt.sh
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja ninja … [OKAY]
op name … installed … compatible
cpu_adam … [NO] … [OKAY] cpu_adagrad … [NO] … [OKAY] fused_adam … [NO] … [OKAY] fused_lamb … [NO] … [OKAY] sparse_attn … [NO] … [OKAY] transformer … [NO] … [OKAY] stochastic_transformer . [NO] … [OKAY] [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-devel package with yum [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io … [NO] … [NO] transformer_inference … [NO] … [OKAY] utils … [NO] … [OKAY] quantizer … [NO] … [OKAY]
DeepSpeed general environment info: torch install path … [‘/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch’] torch version … 1.8.2+cu111 torch cuda version … 11.1 nvcc version … 11.1 deepspeed install path … [‘/qfs/people/shar703/scripts/mega_ai/deepspeed_megatron/DeepSpeed/deepspeed’] deepspeed info … 0.5.9+1d295ff, 1d295ff, master deepspeed wheel compiled w. … torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=1ac4a44 git_branch=main **** using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 using torch.float16 for parameters … ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 … False adam_beta1 … 0.9 adam_beta2 … 0.999 adam_eps … 1e-08 adlr_autoresume … False adlr_autoresume_interval … 1000 apply_query_key_layer_scaling … True apply_residual_connection_post_layernorm … False attention_dropout … 0.1 attention_softmax_in_fp32 … False bert_binary_head … True bert_load … None bf16 … False bias_dropout_fusion … True bias_gelu_fusion … True biencoder_projection_dim … 0 biencoder_shared_query_context_model … False block_data_path … None checkpoint_activations … True checkpoint_in_cpu … False checkpoint_num_layers … 1 clip_grad … 1.0 consumed_train_samples … 0 consumed_train_tokens … 0 consumed_valid_samples … 0 contigious_checkpointing … False cpu_optimizer … False cpu_torch_adam … False curriculum_learning … False data_impl … infer data_parallel_size … 1 data_path … [‘cord19/chemistry_cord19_abstract_document’] dataloader_type … single DDP_impl … local decoder_seq_length … None deepscale … False deepscale_config … None deepspeed … False deepspeed_activation_checkpointing … False deepspeed_config … None deepspeed_mpi … False distribute_checkpointed_activations … False distributed_backend … nccl embedding_path … None encoder_seq_length … 1024 eod_mask_loss … False eval_interval … 100 eval_iters … 10 evidence_data_path … None exit_duration_in_mins … None exit_interval … None ffn_hidden_size … 4096 finetune … False fp16 … True fp16_lm_cross_entropy … False fp32_residual_connection … False global_batch_size … 8 hidden_dropout … 0.1 hidden_size … 1024 hysteresis … 2 ict_head_size … None ict_load … None img_dim … 224 indexer_batch_size … 128 indexer_log_interval … 1000 init_method_std … 0.02 init_method_xavier_uniform … False initial_loss_scale … 4294967296 kv_channels … 64 layernorm_epsilon … 1e-05 lazy_mpu_init … None load … checkpoints/gpt2_345m local_rank … None log_batch_size_to_tensorboard … False log_interval … 10 log_learning_rate_to_tensorboard … True log_loss_scale_to_tensorboard … True log_num_zeros_in_grad … False log_params_norm … False log_timers_to_tensorboard … False log_validation_ppl_to_tensorboard … False loss_scale … None loss_scale_window … 1000 lr … 0.00015 lr_decay_iters … 320000 lr_decay_samples … None lr_decay_style … cosine lr_decay_tokens … None lr_warmup_fraction … 0.01 lr_warmup_iters … 0 lr_warmup_samples … 0 make_vocab_size_divisible_by … 128 mask_prob … 0.15 masked_softmax_fusion … True max_position_embeddings … 1024 memory_centric_tiled_linear … False merge_file … …/deepspeed_megatron/gpt_files/gpt2-merges.txt micro_batch_size … 4 min_loss_scale … 1.0 min_lr … 0.0 mmap_warmup … False no_load_optim … None no_load_rng … None no_save_optim … None no_save_rng … None num_attention_heads … 16 num_channels … 3 num_classes … 1000 num_layers … 24 num_layers_per_virtual_pipeline_stage … None num_workers … 2 onnx_safe … None openai_gelu … False optimizer … adam override_lr_scheduler … False params_dtype … torch.float16 partition_activations … False patch_dim … 16 pipeline_model_parallel_size … 1 profile_backward … False query_in_block_prob … 0.1 rampup_batch_size … None rank … 0 remote_device … none reset_attention_mask … False reset_position_ids … False retriever_report_topk_accuracies … [] retriever_score_scaling … False retriever_seq_length … 256 sample_rate … 1.0 save … checkpoints/gpt2_345m save_interval … 500 scatter_gather_tensors_in_pipeline … True scattered_embeddings … False seed … 1234 seq_length … 1024 sgd_momentum … 0.9 short_seq_prob … 0.1 split … 969, 30, 1 split_transformers … False synchronize_each_layer … False tensor_model_parallel_size … 1 tensorboard_dir … None tensorboard_log_interval … 1 tensorboard_queue_size … 1000 tile_factor … 1 titles_data_path … None tokenizer_type … GPT2BPETokenizer train_iters … 500000 train_samples … None train_tokens … None use_checkpoint_lr_scheduler … False use_contiguous_buffers_in_ddp … False use_cpu_initialization … None use_one_sent_docs … False use_pin_memory … False virtual_pipeline_model_parallel_size … None vocab_extra_ids … 0 vocab_file … …/deepspeed_megatron/gpt_files/gpt2-vocab.json weight_decay … 0.01 world_size … 1 zero_allgather_bucket_size … 0.0 zero_contigious_gradients … False zero_reduce_bucket_size … 0.0 zero_reduce_scatter … False zero_stage … 1.0 -------------------- end of arguments --------------------- setting number of micro-batches to constant 2
building GPT2BPETokenizer tokenizer … padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) initializing torch distributed … initializing tensor model parallel with size 1 initializing pipeline model parallel with size 1 setting random seeds to 1234 … initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 compiling dataset index builder … make: Entering directory
/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for
default’. make: Leaving directory `/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/data’done with dataset index builder. Compilation time: 0.051 seconds compiling and loading fused kernels … Traceback (most recent call last): File “/people/shar703/anaconda3/envs/deepspeed/bin/ninja”, line 33, in <module> sys.exit(load_entry_point(‘ninja’, ‘console_scripts’, ‘ninja’)()) File “/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/ninja-1.10.2.3-py3.8-linux-x86_64.egg/ninja/init.py”, line 51, in ninja raise SystemExit(_program(‘ninja’, sys.argv[1:])) File “/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/ninja-1.10.2.3-py3.8-linux-x86_64.egg/ninja/init.py”, line 47, in _program return subprocess.call([os.path.join(BIN_DIR, name)] + args) File “/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py”, line 340, in call with Popen(*popenargs, **kwargs) as p: File “/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py”, line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File “/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py”, line 1704, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: ‘/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/ninja-1.10.2.3-py3.8-linux-x86_64.egg/ninja/data/bin/ninja’ Traceback (most recent call last): File “pretrain_gpt.py”, line 231, in <module> pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File “/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/training.py”, line 96, in pretrain initialize_megatron(extra_args_provider=extra_args_provider, File “/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/initialize.py”, line 89, in initialize_megatron _compile_dependencies() File “/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/initialize.py”, line 137, in _compile_dependencies fused_kernels.load(args) File “/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/fused_kernels/init.py”, line 71, in load scaled_upper_triang_masked_softmax_cuda = _cpp_extention_load_helper( File “/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/fused_kernels/init.py”, line 47, in _cpp_extention_load_helper return cpp_extension.load( File “/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1079, in load return _jit_compile( File “/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1292, in _jit_compile _write_ninja_file_and_build_library( File “/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1373, in _write_ninja_file_and_build_library verify_ninja_availability() File “/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py”, line 1429, in verify_ninja_availability raise RuntimeError(“Ninja is required to load C++ extensions”) RuntimeError: Ninja is required to load C++ extensions
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (1 by maintainers)
Top GitHub Comments
A temporary solution is to manually add the path of ninja to the PATH environment variable in the torch/utils/cpp_extension.py file