Logging issue with 0.9.0 and current dev branch
See original GitHub issue** Environment **
- OS: Ubuntu 20.04
- Hardware (GPU, or instance type): 8xA100
- cuda: 11.3
- cudnn: 8
- pytorch: 1.12.1
- composer: dev branch installed from source/0.9.0 installed from pip
- transformers: 4.21.2
** To reproduce
I have the following definition of bloom model, mostly copied from the GPT2 definition within composer.
def create_bloom(
model_name: str,
tokenizer_name: str,
use_pretrained: Optional[bool] = False,
model_config: Optional[dict] = None,
gradient_checkpointing: Optional[bool] = False,
) -> ComposerModel:
if not model_config:
model_config = {}
if use_pretrained:
model = transformers.AutoModelForCausalLM.from_pretrained(model_name, **model_config)
else:
config = transformers.AutoConfig.from_pretrained(model_name, **model_config)
model = transformers.AutoModelForCausalLM.from_config(config)
tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_name)
if gradient_checkpointing:
model.gradient_checkpointing_enable()
return HuggingFaceModel(model=model, tokenizer=tokenizer, metrics=[HFCrossEntropy(), Perplexity()])
There are 2 issues, one with the 0.9.0 release and the other with the dev branch.
Steps to reproduce the behavior:
- Running LM training with grad accumulation with 0.9.0 doesn’t plot HF metrics in wandb, but has correct step counts while logging metrics.
You can see that the logs don’t show Perplexity and CrossEntropy metrics.
- Running LM training with grad accumulation with the dev branch plots HF metrics but gets the step count while plotting these metrics completely wrong.
You can see metrics being plotted for 266 step with only 38 batches being trained.
- If I run the same training with deepspeed stage-2 enabled (dev branch), the metrics are plotted with correct step count.
Expected behavior
Both Perplexity and CrossEntropy metrics are plotted with correct step count.
Issue Analytics
- State:
- Created a year ago
- Comments:12 (12 by maintainers)
Top Results From Across the Web
Fails to switch branches properly · Issue #124 - GitHub
The problem is that the way nbgitpuller uses git log to find the delta between your current branch and origin fails to work...
Read more >Maintainer Notes — statsmodels 0.9.0 documentation
This shows the history in a compact way of the current branch. This: git log -p ... Use git log –oneline to find...
Read more >Git "error: The branch 'x' is not fully merged" - Stack Overflow
In practice it means that you probably amended, rebased (including squash merge) or filtered commits and they don't seem identical. Therefore ...
Read more >5.1. Release Notes for Buildbot 0.9.1
Buildbot log viewer now support 256 colors ANSI codes ... In this way, a commit pushed to a branch that is not being...
Read more >Changelog - OpenSSL
When a release is created, that branch is forked off, and its changelog is also forked. For example, none of the changes after...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
yes
OK, @ananyahjha93, this pr that was just merged should fix the issue. Give it another try and let us know if it works!