Trainer not logging to WandB in SageMaker
See original GitHub issuetransformers
version: 4.3.0- wandb version: 0.10.20
- Platform: SageMaker hosted training with PyTorch estimator.
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
I am using a SageMaker training environment to train BertForSequenceClassification
. To do this, I’m passing the model into a Trainer
instance and calling trainer.train()
.
To train in SageMaker, I am using a PyTorch estimator:
estimator = PyTorch(
entry_point='train_classifier.py',
source_dir='./',
role=role,
sagemaker_session=sagemaker_session,
hyperparameters=hp,
subnets=subnets,
security_group_ids=sec_groups,
framework_version='1.6.0',
py_version='py3',
instance_count=1,
instance_type=instance_type,
dependencies=[ '../lib', '../db_conn'],
use_spot_instances=False,
volume_size=100,
#max_wait=max_wait_time_secs
)
estimator.fit()
I have tried this with different p2 and p3 instances.
In EC2 or in a SageMaker notebook, this does automated logging of training loss and evaluation loss and metrics to WandB. With the estimator, I get no training logs.
Anything that I manually log to WandB appears in my dashboard. The only info that doesn’t show up is whatever used to get logged by the Trainer.
I tried os.environ["WANDB_DISALBED"] = "false"
in my training script, no luck.
Issue Analytics
- State:
- Created 3 years ago
- Comments:27 (14 by maintainers)
Top Results From Across the Web
SageMaker - Documentation - Weights & Biases - Wandb
W&B looks for a file named secrets.env relative to the training script and loads them into the environment when wandb.init() is called.
Read more >jambran/wandb_sagemaker_bug_report: Minimal code to ... - GitHub
I'm trying to train on sagemaker, but I can't get a successful training job to complete. I can remove the wandb logging code,...
Read more >Technical FAQ · GitBook
When wandb.init() is called from your training script an API call is made to ... Calling wandb.log writes a line to a local...
Read more >"No space left on device" when using HuggingFace + ...
I'm not sure what is triggering this problem because the volume size ... using a HuggingFace estimator in SageMaker pipelines training job.
Read more >AWS SageMaker Experiments with Weights and Biases
Problem Statement; Dataset; Set up the experiment; Track experiment; Accessing Training Metrics using Experiments UI from SageMaker Studio ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yup
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.