[CLI] pytorch-lightning and wandb - Abnormal program exit

Hello,

Description I implemented a model with pytorch_lightning and WandLogger, but it seems wandb makes the system crash before the start of the training. The issue looks similar to this : https://github.com/wandb/client/issues/1293

Wandb features

I am using WandbLogger from pytorch_lightning.loggers I was not sure whether I should use only WandbLogger or both WandbLogger and wandb (the python library) I only did : from pytorch_lightning.loggers import WandbLogger and removed : import wandb

I used to initialize the logger like this :

wandb_logger = WandbLogger(project= 'april_sanbox')                          
                           config={
                               "machine" : socket.gethostname(),
                               "env" : os.environ['CONDA_DEFAULT_ENV'],
                               "lr" : float(args.lr),
                               "epsilon" : float(args.eps),
                               "epochs": args.epochs,
                               "model" : '',

                           })

But I just kept wandb_logger = WandbLogger(project= 'april_sanbox') in order to be as close as possible to the examples given in : https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch-lightning/Supercharge_your_Training_with_Pytorch_Lightning_%2B_Weights_%26_Biases.ipynb https://colab.research.google.com/drive/16d1uctGaw2y9KhGBlINNTsWpmlXdJwRW?usp=sharing

How to reproduce

The training loop :

wandb_logger = WandbLogger(project= 'april_sanbox')
'''                           
                           config={
                               "machine" : socket.gethostname(),
                               "env" : os.environ['CONDA_DEFAULT_ENV'],
                               "toy" : int(args.toy),

                               "doc_len" : args.max_seq_len_doc,
                               "parag_len" : args.max_seq_len_parag,
                               "parags_nbr" : args.max_nbr_parags,
                               "batch_size": args.bs,

                               "lr" : float(args.lr),
                               "epsilon" : float(args.eps),
                               "epochs": args.epochs,
                               "model" : '',

                           })

wandb.init(project= 'april_sanbox',
           config={
               "machine" : socket.gethostname(),
               "env" : os.environ['CONDA_DEFAULT_ENV'],
               "toy" : int(args.toy),

               "doc_len" : args.max_seq_len_doc,
               "parag_len" : args.max_seq_len_parag,
               "parags_nbr" : args.max_nbr_parags,
               "batch_size": args.bs,

               "lr" : float(args.lr),
               "epsilon" : float(args.eps),
               "epochs": args.epochs,
               "model" : '',

           }
           )
'''

# [...]


tokenizer = CamembertTokenizerFast.from_pretrained('camembert-base')

train_valid_test_data_module = QHLD_DataModule(df_train=df_train, df_valid=df_valid, df_test=df_test,
                                               tokenizer=tokenizer,
                                               batch_size=args.bs,
                                               num_workers=cpu_count()-1)



# [...]

from aux_20F_train import VanillaCamembert
model = VanillaCamembert(nbr_targets=len(list_labels))

#wandb.config.update({"model": model.model_name}, allow_val_change=True)
#wandb.watch(model)

##########################################################
### TRAINING

trainer = pl.Trainer(logger=wandb_logger,
                     max_epochs=args.epochs,
                     gpus=torch.cuda.device_count())
trainer.fit(model=model,
            datamodule=train_valid_test_data_module)

The model:

class VanillaCamembert(pl.LightningModule) :
    def __init__(self, nbr_targets):
        super().__init__()
        self.nbr_targets = nbr_targets
        self.model_name = 'vanilla_camembert'

        self.bert = CamembertForSequenceClassification.from_pretrained('camembert-base',
                                                                       num_labels=self.nbr_targets)
        print(self.bert.config)


        self.save_hyperparameters()


    def forward(self, batch_input_ids, batch_att_masks):
        out = self.bert(batch_input_ids, batch_att_masks)

        return out.logits

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-5) 
        return optimizer

    def training_step(self, train_batch, batch_idx) :
        batch_input_ids, batch_att_masks, _, _, _, y_true = train_batch

        y_pred = self(batch_input_ids, batch_att_masks)

        loss = F.binary_cross_entropy_with_logits(y_pred, y_true.float())
        self.log('train_loss', loss, on_epoch=True)

        return loss


    def validation_step(self, valid_batch, batch_idx):
        batch_input_ids, batch_att_masks, _, _, _, y_true = valid_batch

        y_pred = self(batch_input_ids, batch_att_masks)

        loss = F.binary_cross_entropy_with_logits(y_pred, y_true.float())
        self.log('val_loss', loss)

Environment

OS: Operating System: Debian GNU/Linux 9 (stretch) Kernel: Linux 4.9.0-14-amd64 Architecture: x86-64
Environment: Anaconda (running script in terminal) # packages in environment at /u/salaunol/anaconda3/envs/h21_cuda101: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge absl-py 0.12.0 pyhd8ed1ab_0 conda-forge aiohttp 3.7.4.post0 pypi_0 pypi argon2-cffi 20.1.0 py38h25fe258_2 conda-forge async-timeout 3.0.1 py_1000 conda-forge async_generator 1.10 py_0 conda-forge attrs 20.3.0 pyhd3deb0d_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.1 py_0 conda-forge blas 1.0 mkl
bleach 3.3.0 pyh44b312d_0 conda-forge blinker 1.4 py_1 conda-forge brotlipy 0.7.0 py38h27cfd23_1003
c-ares 1.17.1 h7f98852_1 conda-forge ca-certificates 2021.1.19 h06a4308_1
cachetools 4.2.1 pyhd8ed1ab_0 conda-forge certifi 2020.12.5 py38h06a4308_0
cffi 1.14.5 py38h261ae71_0
chardet 4.0.0 py38h06a4308_1003
click 7.1.2 pyhd3eb1b0_0
configparser 5.0.2 pypi_0 pypi cryptography 3.4.6 py38hd23ed53_0
cudatoolkit 10.1.243 h6bb024c_0
dataclasses 0.8 pyh6d0b6a4_7
decorator 4.4.2 py_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge docker-pycreds 0.4.0 pypi_0 pypi emoji 1.2.0 pypi_0 pypi entrypoints 0.3 pyhd8ed1ab_1003 conda-forge filelock 3.0.12 pyhd3eb1b0_1
freetype 2.10.4 h5ab3b9f_0
fsspec 0.8.7 pyhd8ed1ab_0 conda-forge future 0.18.2 py38h578d9bd_3 conda-forge gitdb 4.0.5 pypi_0 pypi gitpython 3.1.14 pypi_0 pypi google-auth 1.28.0 pyh44b312d_0 conda-forge google-auth-oauthlib 0.4.4 pypi_0 pypi grpcio 1.36.1 pypi_0 pypi icu 58.2 hf484d3e_1000 conda-forge idna 2.10 pyhd3eb1b0_0
importlib-metadata 3.7.3 py38h578d9bd_0 conda-forge intel-openmp 2020.2 254
ipykernel 5.5.0 py38h81c977d_1 conda-forge ipython 7.21.0 py38h81c977d_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge jedi 0.18.0 py38h578d9bd_2 conda-forge jinja2 2.11.3 pyh44b312d_0 conda-forge joblib 1.0.1 pyhd3eb1b0_0
jpeg 9b h024ee3a_2
jsonschema 3.2.0 pyhd8ed1ab_3 conda-forge jupyter_client 6.1.12 pyhd8ed1ab_0 conda-forge jupyter_contrib_core 0.3.3 py_2 conda-forge jupyter_contrib_nbextensions 0.5.1 pyhd8ed1ab_2 conda-forge jupyter_core 4.7.1 py38h578d9bd_0 conda-forge jupyter_highlight_selected_word 0.2.0 py38h578d9bd_1002 conda-forge jupyter_latex_envs 1.4.6 pyhd8ed1ab_1002 conda-forge jupyter_nbextensions_configurator 0.4.1 py38h578d9bd_2 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge lcms2 2.11 h396b838_0
ld_impl_linux-64 2.33.1 h53a641e_7
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h2828fa1_18 conda-forge libgfortran-ng 7.3.0 hdf63c60_0
libgomp 9.3.0 h2828fa1_18 conda-forge libpng 1.6.37 hbc83047_0
libprotobuf 3.14.0 h8c45485_0
libsodium 1.0.18 h36c2ea0_1 conda-forge libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.2.0 h3942068_0
libuv 1.40.0 h7b6447c_0
libwebp-base 1.2.0 h27cfd23_0
libxml2 2.9.10 hb55368b_3
libxslt 1.1.34 hc22bd24_0
lxml 4.6.2 py38h9120a33_0
lz4-c 1.9.3 h2531618_0
markdown 3.3.4 pyhd8ed1ab_0 conda-forge markupsafe 1.1.1 py38h8df0ef7_2 conda-forge mistune 0.8.4 py38h25fe258_1002 conda-forge mkl 2020.2 256
mkl-service 2.3.0 py38he904b0f_0
mkl_fft 1.3.0 py38h54f3939_0
mkl_random 1.1.1 py38h0573a6f_0
multidict 5.1.0 py38h497a2fe_1 conda-forge nbclient 0.5.3 pyhd8ed1ab_0 conda-forge nbconvert 6.0.7 py38h578d9bd_3 conda-forge nbformat 5.1.2 pyhd8ed1ab_1 conda-forge ncurses 6.2 he6710b0_1
nest-asyncio 1.4.3 pyhd8ed1ab_0 conda-forge ninja 1.10.2 py38hff7bd54_0
notebook 6.2.0 py38h578d9bd_0 conda-forge numpy 1.19.2 py38h54aff64_0
numpy-base 1.19.2 py38hfa32c7d_0
oauthlib 3.1.0 pypi_0 pypi olefile 0.46 py_0
openssl 1.1.1k h27cfd23_0
packaging 20.9 pyhd3eb1b0_0
pandas 1.2.3 py38ha9443f7_0
pandoc 2.12 h7f98852_0 conda-forge pandocfilters 1.4.2 py_1 conda-forge parso 0.8.1 pyhd8ed1ab_0 conda-forge pathtools 0.1.2 pypi_0 pypi pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 8.1.2 py38he98fc37_0
pip 21.0.1 py38h06a4308_0
prometheus_client 0.9.0 pyhd3deb0d_0 conda-forge promise 2.3 pypi_0 pypi prompt-toolkit 3.0.17 pyha770c72_0 conda-forge protobuf 3.14.0 py38h2531618_1
psutil 5.8.0 pypi_0 pypi ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pyasn1 0.4.8 py_0 conda-forge pyasn1-modules 0.2.8 pypi_0 pypi pycparser 2.20 py_2
pygments 2.8.1 pyhd8ed1ab_0 conda-forge pyjwt 2.0.1 pyhd8ed1ab_1 conda-forge pyopenssl 20.0.1 pyhd3eb1b0_1
pyparsing 2.4.7 pyhd3eb1b0_0
pyrsistent 0.17.3 py38h25fe258_1 conda-forge pysocks 1.7.1 py38h06a4308_0
python 3.8.8 hdb3f193_4
python-dateutil 2.8.1 py_0 conda-forge python_abi 3.8 1_cp38 huggingface pytorch 1.7.1 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch pytorch-lightning 1.2.6 pyhd8ed1ab_0 conda-forge pytz 2021.1 pyhd3eb1b0_0
pyyaml 5.3.1 pypi_0 pypi pyzmq 19.0.2 py38ha71036d_2 conda-forge readline 8.1 h27cfd23_0
regex 2020.11.13 py38h27cfd23_0
requests 2.25.1 pyhd3eb1b0_0
requests-oauthlib 1.3.0 pyh9f0ad1d_0 conda-forge rsa 4.7.2 pyh44b312d_0 conda-forge sacremoses master py_0 huggingface scikit-learn 0.24.1 py38ha9443f7_0
scipy 1.6.2 py38h91f5cce_0
send2trash 1.5.0 py_0 conda-forge sentry-sdk 1.0.0 pypi_0 pypi setuptools 52.0.0 py38h06a4308_0
shortuuid 1.0.1 pypi_0 pypi six 1.15.0 py38h06a4308_0
smmap 3.0.5 pypi_0 pypi sqlite 3.35.1 hdfb4753_0
subprocess32 3.5.4 pypi_0 pypi tensorboard 2.4.1 pyhd8ed1ab_0 conda-forge tensorboard-plugin-wit 1.8.0 pyh44b312d_0 conda-forge terminado 0.9.2 py38h578d9bd_0 conda-forge testpath 0.4.4 py_0 conda-forge threadpoolctl 2.1.0 pyh5ca1d4c_0
tk 8.6.10 hbc83047_0
tokenizers 0.10.1 py38_0 huggingface torchaudio 0.7.2 py38 pytorch torchmetrics 0.2.0 pyhd8ed1ab_0 conda-forge torchvision 0.8.2 py38_cu101 pytorch tornado 6.1 py38h25fe258_0 conda-forge tqdm 4.59.0 pyhd3eb1b0_1
traitlets 5.0.5 py_0 conda-forge transformers 4.4.2 py_0 huggingface typing-extensions 3.7.4.3 0 conda-forge typing_extensions 3.7.4.3 py_0 conda-forge urllib3 1.26.4 pyhd3eb1b0_0
wandb 0.10.25 pypi_0 pypi wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge werkzeug 1.0.1 pyh9f0ad1d_0 conda-forge wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h516909a_0 conda-forge yarl 1.6.3 py38h497a2fe_1 conda-forge zeromq 4.3.3 he6710b0_3
zipp 3.4.1 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h7b6447c_3
zstd 1.4.5 h9ceee32_0
Python Version: Python 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0] :: Anaconda, Inc. on linux

Terminal output

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]                                                                                                                                          
wandb: Currently logged in as: osalaun (use `wandb login --relogin` to force relogin)                                                                                                
Problem at: /u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py 155 experiment                                                     
Traceback (most recent call last):                       
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 742, in init                                                               
    run = wi.init()                                                                                                                                                                  
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 421, in init                                                               
    backend.ensure_launched()                                                                                                                                                        
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 125, in ensure_launched                                               
    self.wandb_process.start()                                                                                                                                                       
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/process.py", line 121, in start                                                                         
    self._popen = self._Popen(self)                                                                                                                                                  
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/context.py", line 284, in _Popen                                                                        
    return Popen(process_obj)                                                                                                                                                        
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__                                                             
    super().__init__(process_obj)                                                                                                                                                    
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__                                                                    
    self._launch(process_obj)                                                                                                                                                        
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch                                                              
    prep_data = spawn.get_preparation_data(process_obj._name)                                                                                                                        
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data                                                            
    _check_not_importing_main()                                                                                                                                                      
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main                                                       
    raise RuntimeError('''                                                                                                                                                           
RuntimeError:                                                                                                                                                                        
        An attempt has been made to start a new process before the                                                                                                                   
        current process has finished its bootstrapping phase.                                                                                                                        
                                                                                                                                                                                     
        This probably means that you are not using fork to start your                                                                                                                
        child processes and you have forgotten to use the proper idiom                                                                                                               
        in the main module:                                                                                                                                                          
                                                                                                                                                                                     
            if __name__ == '__main__':                                                                                                                                               
                freeze_support()                                                                                                                                                     
                ...                                                                                                                                                                  
                                                                                                                                                                                     
        The "freeze_support()" line can be omitted if the program                                                                                                                    
        is not going to be frozen to produce an executable.                                                                                                                          
wandb: ERROR Abnormal program exit                                                                                                                                                   
Traceback (most recent call last):                                                                                                                                                   
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 742, in init                                                               
    run = wi.init()                                                                                                                                                                  
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 421, in init                                                               
    backend.ensure_launched()                                                                                                                                                        
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 125, in ensure_launched                                               
    self.wandb_process.start()                                                                                                                                                       
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/process.py", line 121, in start                                                                         
    self._popen = self._Popen(self)                                                                                                                                                  
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/context.py", line 284, in _Popen                                                                        
    return Popen(process_obj)                                                                                                                                                        
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__                                                             
    super().__init__(process_obj)                                                                                                                                                    
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__                                                                    
    self._launch(process_obj)                                                                                                                                                        
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch                                                              
    prep_data = spawn.get_preparation_data(process_obj._name)                                                                                                                        
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data                                                            
    _check_not_importing_main()                                                                                                                                                      
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/u/salaunol/Documents/_2021_hiver/QHLD/20F_train.py", line 218, in <module>
    trainer.fit(model=model,
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 496, in fit
    self.pre_dispatch()
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 530, in pre_dispatch
    self.logger.log_hyperparams(self.lightning_module.hparams_initial)
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 42, in wrapped_fn
    return fn(*args, **kwargs)
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py", line 184, in log_hyperparams
    self.experiment.config.update(params, allow_val_change=True)
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 41, in experiment
    return get_experiment() or DummyExperiment()
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 42, in wrapped_fn
    return fn(*args, **kwargs)
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 39, in get_experiment
    return fn(self)
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py", line 155, in experiment
    self._experiment = wandb.init(
  File "/u/salaunol/anaconda3/envs/h21_cuda101/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 779, in init
    six.raise_from(Exception("problem"), error_seen)
  File "<string>", line 3, in raise_from
Exception: problem

Edit : added terminal output

Issue Analytics

State:
Created 2 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

oliviersalauncommented, Apr 10, 2021

@borisdayma I have not updated transformers yet, but I think I found the origin of the problem. When initializing the lightning Trainer, I passed the following arguments:

trainer = pl.Trainer(logger=wandb_logger,
                     max_epochs=args.epochs,
                     gpus=torch.cuda.device_count())

Although the machine has two GPUs, only one is actually suitable for training (the other one is just for the monitor). Still, torch.cuda.device_count() returns 2. If I manually set:

trainer = pl.Trainer(logger=wandb_logger,
                     max_epochs=args.epochs,
                     gpus=1)

then, the problem is solved.

Edit: @vanpelt I already double-checked about wandb.init, there was none of that.

0reactions

ariG23498commented, May 3, 2021

Hey @oliviersalaun

Thanks for following up with the thread and helping with the solution. Would you like to close the thread if the issue has been resolved?

Top Results From Across the Web

[CLI] pytorch-lightning and wandb - Abnormal program exit

Hello, Description I implemented a model with pytorch_lightning and WandLogger, but it seems wandb makes the system crash before the start ...

Traceback error - W&B Help

Hey guys, I am totally new to W&B. I am getting a Traceback error when I want to run “wandb.init(project=”…“)”. Last week it...

wandb — PyTorch Lightning 1.8.5.post0 documentation

A new W&B run will be created when training starts if you have not created one manually before with wandb.init() . Log metrics....

attributeerror: module 'wandb' has no attribute 'init'

"Windows" and sys.stdout.encoding == "UTF-8": AttributeError: 'Logger' object has no attribute 'encoding' wandb: ERROR Abnormal program exit Traceback (most ...

reidsanders.net

My logging situation was complicated by wandb errors on multiple cores. ... --help show this help message and exit --train_set TRAIN_SET the training...