Training Not Initiating (Windows 10)
See original GitHub issueOS: Windows 10 GPU: GTX 1060
Everything appears to run fine up until “.train” is hit, then everything comes to a halt.
`[00:00:00] Reading files █████████████████████████████████████████████████████████████████████████ 100 [00:00:01] Tokenize words █████████████████████████████████████████████████████████████████████████ 15057 / 15057 [00:00:00] Count pairs █████████████████████████████████████████████████████████████████████████ 15057 / 15057 [00:00:00] Compute merges █████████████████████████████████████████████████████████████████████████ 4743 / 4743
INFO:aitextgen.tokenizers:Saving aitextgen-vocab.json and aitextgen-merges.txt to the current directory. You will need both files to build the GPT2Tokenizer. INFO:aitextgen:Constructing GPT-2 model from provided config. INFO:aitextgen:Using a custom tokenizer. GPU available: True, used: True INFO:lightning:GPU available: True, used: True No environment variable for node rank defined. Set as 0. WARNING:lightning:No environment variable for node rank defined. Set as 0. CUDA_VISIBLE_DEVICES: [0] INFO:lightning:CUDA_VISIBLE_DEVICES: [0] 0%| | 0/5000 [00:00<?, ?it/s]Traceback (most recent call last): File “<string>”, line 1, in <module> Traceback (most recent call last): File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py”, line 105, in spawn_main File “shootme.py”, line 16, in <module> exitcode = _main(fd) ai.train(data, batch_size=16, num_steps=5000) File “Z:\0__0\0_seo\aitextgen\aitextgen\aitextgen.py”, line 563, in train File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py”, line 114, in _main trainer.fit(train_model) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\trainer.py”, line 859, in fit prepare(preparation_data) File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py”, line 225, in prepare self.single_gpu_train(model) _fixup_main_from_path(data[‘init_main_from_path’]) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\distrib_parts.py”, line 503, in single_gpu_train File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py”, line 277, in _fixup_main_from_path run_name=“mp_main”) self.run_pretrain_routine(model) File “C:\Users_\Anaconda3\envs\aitext\lib\runpy.py”, line 263, in run_path File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\trainer.py”, line 1015, in run_pretrain_routine pkg_name=pkg_name, script_name=fname) File “C:\Users_\Anaconda3\envs\aitext\lib\runpy.py”, line 96, in _run_module_code self.train() File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py”, line 347, in train mod_name, mod_spec, pkg_name, script_name) File “C:\Users_\Anaconda3\envs\aitext\lib\runpy.py”, line 85, in _run_code self.run_training_epoch() exec(code, run_globals) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py”, line 406, in run_training_epoch File “Z:\0__0\0_seo\aitextgen\shootme.py”, line 16, in <module> ai.train(data, batch_size=16, num_steps=5000) enumerate(_with_is_last(train_dataloader)), “get_train_batch” File “Z:\0__0\0_seo\aitextgen\aitextgen\aitextgen.py”, line 563, in train File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\profiler\profilers.py”, line 64, in profile_iterable trainer.fit(train_model) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\trainer.py”, line 859, in fit value = next(iterator) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py”, line 800, in _with_is_last self.single_gpu_train(model) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\distrib_parts.py”, line 503, in single_gpu_train it = iter(iterable) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\torch\utils\data\dataloader.py”, line 279, in iter self.run_pretrain_routine(model) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\trainer.py”, line 1015, in run_pretrain_routine return _MultiProcessingDataLoaderIter(self) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\torch\utils\data\dataloader.py”, line 719, in init self.train() File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py”, line 347, in train w.start() File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\process.py”, line 112, in start self.run_training_epoch() self._popen = self._Popen(self) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py”, line 406, in run_training_epoch File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\context.py”, line 223, in _Popen enumerate(_with_is_last(train_dataloader)), “get_train_batch” File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\profiler\profilers.py”, line 64, in profile_iterable return _default_context.get_context().Process._Popen(process_obj) File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\context.py”, line 322, in _Popen value = next(iterator) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py”, line 800, in _with_is_last return Popen(process_obj) it = iter(iterable) File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\popen_spawn_win32.py”, line 89, in init File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\torch\utils\data\dataloader.py”, line 279, in iter reduction.dump(process_obj, to_child) File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\reduction.py”, line 60, in dump return _MultiProcessingDataLoaderIter(self) ForkingPickler(file, protocol).dump(obj) File “C:\Users_\Anaconda3\envs\aitext\lib\site-packages\torch\utils\data\dataloader.py”, line 719, in init BrokenPipeError: [Errno 32] Broken pipe w.start() File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\process.py”, line 112, in start self._popen = self._Popen(self) File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\context.py”, line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\context.py”, line 322, in _Popen return Popen(process_obj) File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\popen_spawn_win32.py”, line 46, in init prep_data = spawn.get_preparation_data(process_obj._name) File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py”, line 143, in get_preparation_data _check_not_importing_main() File “C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py”, line 136, in _check_not_importing_main is not going to be frozen to produce an executable.‘’') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
0%| | 0/5000 [00:06<?, ?it/s] 0%| | 0/5000 [00:00<?, ?it/s]`
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (2 by maintainers)
@minimaxir @gldstrrbt so I tried being a total idiot and read the error message which told me how to fix it. I added
if __name__ == '__main__':
at the top of my code, indented the rest, and its training right now.Other possibility is that it’s a Windows thing (another one of my repos had an issue with subprocesses on Windows which I never found a fix for it)
At the least, it might not be an issue with aitextgen specifically; not sure if there’s an easy solution. (need to get a Windows machine to test at some point.)