question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Logging issue with 0.9.0 and current dev branch

See original GitHub issue

** Environment **

  • OS: Ubuntu 20.04
  • Hardware (GPU, or instance type): 8xA100
  • cuda: 11.3
  • cudnn: 8
  • pytorch: 1.12.1
  • composer: dev branch installed from source/0.9.0 installed from pip
  • transformers: 4.21.2

** To reproduce

I have the following definition of bloom model, mostly copied from the GPT2 definition within composer.

def create_bloom(
    model_name: str,
    tokenizer_name: str,
    use_pretrained: Optional[bool] = False,
    model_config: Optional[dict] = None,
    gradient_checkpointing: Optional[bool] = False,
) -> ComposerModel:

    if not model_config:
        model_config = {}

    if use_pretrained:
        model = transformers.AutoModelForCausalLM.from_pretrained(model_name, **model_config)
    else:
        config = transformers.AutoConfig.from_pretrained(model_name, **model_config)
        model = transformers.AutoModelForCausalLM.from_config(config)

    tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_name)

    if gradient_checkpointing:
        model.gradient_checkpointing_enable()

    return HuggingFaceModel(model=model, tokenizer=tokenizer, metrics=[HFCrossEntropy(), Perplexity()])

There are 2 issues, one with the 0.9.0 release and the other with the dev branch.

Steps to reproduce the behavior:

  1. Running LM training with grad accumulation with 0.9.0 doesn’t plot HF metrics in wandb, but has correct step counts while logging metrics.

You can see that the logs don’t show Perplexity and CrossEntropy metrics. image image

  1. Running LM training with grad accumulation with the dev branch plots HF metrics but gets the step count while plotting these metrics completely wrong.

You can see metrics being plotted for 266 step with only 38 batches being trained. image image

  1. If I run the same training with deepspeed stage-2 enabled (dev branch), the metrics are plotted with correct step count.

Expected behavior

Both Perplexity and CrossEntropy metrics are plotted with correct step count.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
ananyahjha93commented, Aug 29, 2022

yes

0reactions
eracahcommented, Aug 31, 2022

OK, @ananyahjha93, this pr that was just merged should fix the issue. Give it another try and let us know if it works!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fails to switch branches properly · Issue #124 - GitHub
The problem is that the way nbgitpuller uses git log to find the delta between your current branch and origin fails to work...
Read more >
Maintainer Notes — statsmodels 0.9.0 documentation
This shows the history in a compact way of the current branch. This: git log -p ... Use git log –oneline to find...
Read more >
Git "error: The branch 'x' is not fully merged" - Stack Overflow
In practice it means that you probably amended, rebased (including squash merge) or filtered commits and they don't seem identical. Therefore ...
Read more >
5.1. Release Notes for Buildbot 0.9.1
Buildbot log viewer now support 256 colors ANSI codes ... In this way, a commit pushed to a branch that is not being...
Read more >
Changelog - OpenSSL
When a release is created, that branch is forked off, and its changelog is also forked. For example, none of the changes after...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found