model is moved too early in create_supervised_trainer/evaluator
See original GitHub issueHi, first of all, thanks for the great library!
I have a general “bug” report or question on the specific implementation of create_supervised_trainer
and create_supervised_evaluator
.
As seen for example here:
https://github.com/pytorch/ignite/blob/master/ignite/engine/__init__.py#L64
The moment a supervised evaluator/trainer is created the model is moved to the specific device and not during the inference stage.
I agree, that this version doesn’t have an impact in general, but I would argue that the model should be moved in the _inference
function and not when I create the evaluator/trainer.
A weird example but similar to my case:
I have multiple evaluators and I would like to run one on the CPU and one on the GPU.
The reason being that a different library doesn’t support GPUs yet and I would like to add it in an “ignite” style to keep the code similar.
Now, the run steps will not work, as the model is not moved again in the inference stage.
I can imagine that there is a reason why it is the way it is, but I am wondering if a future version couldn’t do the model.to(device)
step in the _inference
function. 😃
Thanks!
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
yes, please
On Tue, Mar 10, 2020, 09:47 kai-tub notifications@github.com wrote:
Yes, seems like it would be better to not move the model in
trainer
.In my case, I only used these evaluators after the training procedure. So there was no need to move the model back in-between steps. But I initialized both at the same time and that is how I saw the “bug”.
I don’t see a good way how we could still move the model in the function. Right now, I would favor removing the model moving process and updating the docs.
But I can imagine that a lot of code uses this behavior to save one line of moving the model before initializing the optimizer. What is your opinion of trying to catch an error due to different devices and adding more details to the exception? Or should the docs suffice? Or do you want to add a deprecation warning?
Do you have an alternative approach?