Running Demo (MultiSaver) on Windows
See original GitHub issueI spent many hours trying to get this to work under Windows. I managed to get it to work now, so this is probably useful to others.
Setup
The first obstacle is the readline
Python package, which seems to be default on Unix systems, but not on Windows. For this, simply install the pyreadline
package, which is a Windows port of readline.
Understanding the command-line
Example command: python main.py --save_dir REDS_L1 --demo_input_dir d:/datasets/motion47set/noise_only --demo_output_dir ../results/motion47set
.
Explanation: specifying --demo_input_dir
(or --demo true
) will run an evaluation, using a pretrained model as specified in --save_dir
. Every image of my motion47set will be evaluated. The results will be saved alongside the folders src
and experiments
at the project root, in a folder results/motion47set
.
Note that even getting this far is not very intuitive, as others have already pointed out. Usually there is a separate python script for just evaluation/testing/inference. Next, the term demo is a bit unusual, at first I was expecting some interactive demonstration of some form. The save_dir
I had at first used as what demo_output_dir
does.
Another word of caution, if the output path is given without any .
, it somehow ends up saving the results at d:/results/motion47set
, which again took me a while to figure out, i.e. on the root of the same drive that the project is located at. I suggest printing out the absolute output dir with os.path.abspath
to the user at some point, for clarity.
Bug
Running the above command will produce the following output:
===> Loading demo dataset: Demo
Loading model from ../experiment\REDS_L1\models\model-200.pt
Loading optimizer from ../experiment\REDS_L1\optim\optim-200.pt
Loss function: 1*L1
Metrics: PSNR,SSIM
Loading loss record from ../experiment\REDS_L1\loss.pt
===> Initializing trainer
results are saved in ../results/motion47set
| | 0/90 [00:00<?, ?it/s]Can't pickle local object 'MultiSaver.begin_background.<locals>.t'
|██▏ | 4/90 [00:06<02:14, 1.56s/it]Traceback (most recent call last):
File "<string>", line 1, in <module>
File "d:\Program Files\Anaconda3\envs\torch gpu\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "d:\Program Files\Anaconda3\envs\torch gpu\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
|█████▋ | 11/90 [00:10<01:12, 1.09it/s]forrtl: error (200): program aborting due to control-C event
Also note that ctrl+c takes a really long time to terminate for me, and even slows down my entire machine for several seconds.
This is difficult to debug, because there is no fatal exception, and everything seems to run normally, ignoring the errors, which might also just be warnings, for all we know. I did not realize for a while that MultiSaver is a file of this project, which is why there is not much help online in regards to this error/warning. Second, the only that that gives a little stronger hint that this is an error, and not a warning, is the EOFError
, which I still don’t know why or where it even happens. A large part of debugging time was me assuming these were just warnings, and trying to fix the command-line arguments instead, since that is easy to get wrong.
What is actually happening is that the MultiSaver code runs clean on the main thread, but then each spawned thread/process will fail, without the main thread being aware. As a result, the program runs through, attempts to save the output images, which all do nothing since the threads/processes already died. I’m not sure how to to achieve this, but it would be nice if the program stops running when it is unable to save output images (at least in demo mode, where that’s about the only purpose).
The keywords to locate the actual issue here are pickle
and multiprocessing
. Going into utils.py
and looking at the class MultiSaver
shows us a method begin_background
, with a method-local variable t
(another method). Defining that method works, however (under Windows) that variable has to be pickled/serialized to hand it over to the mp.Process
, which will run it in a different thread/process. This fails because pickle does not support local objects.
I tried various ways to change the scope of t
:
- put
global t
before the definition oft
(no change) - move
t
to the outermost scope of the file utils.py, i.e. same level as MutliSaver (can pickle the method, but later fails at a different point) - the solution that works is putting
t
on the same scope as MultiSaver, and annotating it with@staticmethod
. The annotation avoids the first method parameter to be used asself
.
So my modification looks like this
class MultiSaver():
...
@staticmethod
def t(queue):
...
def begin_background(self):
self.queue = mp.Queue()
worker = lambda: mp.Process(target=MultiSaver.t, args=(self.queue,), daemon=False)
...
...
After this change, everything works as expected. I haven’t tested it, but I suspect this will still work under Unix as well.
I’m not sure if this will work if multiple instances of MultiSaver
are created, and maybe this would give the same result as putting t
to the outermost scope, i.e. fail again.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:6
- Comments:6
@mj9 Hi, Can you share your utils.py with us? Send the contents of the document directly to the forum. I’ve been bothering for days about this problem that running on the window10.Thank you. My code just like that
And my running text is
I don’t have access to the code currently, but note how your method t is inside another function. You have to put it directly under MultiSaver