question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[tests] where to put deepspeed + fairscale tests

See original GitHub issue

As a split off from this comment https://github.com/huggingface/transformers/pull/10039#pullrequestreview-585482462 we need to find a new home for deepspeed + fairscale tests.

Currently there are under examples/seq2seq because they rely on finetune_trainer.py ( run_seq2seq.py once the transition is over).

@sgugger suggests to keep the seq2seq folder as simple as possible. We also have ds_config.json there that could be moved too.

Seeing what’s happening in the fairscale land - I think we will need a bunch of various tests there in the future too.

So where should we put the deepspeed + fairscale tests?

Ideally they should be put under main tests, since they are part of the trainer core, but I’m not sure whether reaching across the tests suite is a clean approach.

My fantasy is that one day transformers will have a few essential tools that aren’t examples, and those will then leave somewhere in the main tree, perhaps src/transformers/apps and then it’d be easy to have such tests under tests.

So suggestions for now:

  1. create examples/deepspeed and examples/fairscale
  2. create examples/distributed and perhaps have all those extensions tested in one folder
  3. create a new 3rd test suite for integrations
  4. create tests/deepspeed - but as voiced earlier I’m not sure how reaching across a test suite will work - need to try - also this proposes to change the current flat structure of tests.

Perhaps you have other ideas.

@sgugger, @patrickvonplaten, @LysandreJik, @patil-suraj

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
stas00commented, Feb 8, 2021

Thank you for the clarification, @sgugger. I will start working on that transition.

1reaction
sguggercommented, Feb 8, 2021

Could you pelase clarify, do you prefer most/all of the examples/tests to be flat, or would grouping make things easier to make sense of - I’m asking since some tests come with extra files (as is the case with ds_config files) - so examples/tests/deepspeed, …

We can certainly have several files once they are all together in one folder. I’d just like the examples subfolders to be clean, our internal testing should be setup so it’s the easiest for us to understand/debug.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fit More and Train Faster With ZeRO via DeepSpeed and ...
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >
Fully Sharded Data Parallel - FairScale Documentation
A wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage...
Read more >
DeepSpeed: Latest News
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Read more >
Training a 1 Trillion Parameter Model With PyTorch Fully ...
Our tentative plans include: Test with Adam optimizer, optimize it by fusing optimizers with nested FSDP instances. Test distributed model ...
Read more >
FSDP_tutorial.rst.txt - PyTorch
*Setup* 1.1 Install Pytorch along with Torchvision .. code-block:: bash pip3 ... SUM) if rank == 0: test_loss = ddp_loss[0] / ddp_loss[2] print('Test...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found