question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

WandbLogger causes the program to crash without an error

See original GitHub issue

🐛 Bug

In my simple setup, I have been using TensorboardLogger and other loggers and everything worked perfectly. However, when I tried to use WandbLogger, the program crashes without any error. It just prints a dictionary and stops.

To Reproduce

{'_identity': (1,), '_config': {'authkey': b'i\x01FT\x8b\xe0\xf6e}\xe7\xce\xe7\xa1\xb5\x9a\x9bF2A?\x95\x95\xe1\x85I\x82a\xa4\xef4\xc3=', 'semprefix': '/mp'}, '_parent_pid': 4007964, '_parent_name': 'MainProcess', '_popen': None, '_closed': False, '_target': <function wandb_internal at 0x7f4792fd4550>, '_args': (), '_kwargs': {'settings': {'_args': [], '_cli_only_mode': None, '_colab': False, '_config_dict': None, '_console': <SettingsConsole.REDIRECT: 2>, '_cuda': None, '_disable_meta': None, '_disable_stats': None, '_disable_viewer': None, '_except_exit': None, '_executable': '/home/mazen/miniconda3/envs/jdt/bin/python', '_internal_check_process': 8, '_internal_queue_timeout': 2, '_jupyter': False, '_jupyter_name': None, '_jupyter_path': None, '_jupyter_root': None, '_kaggle': False, '_noop': False, '_offline': False, '_os': 'Linux-5.13.0-28-generic-x86_64-with-glibc2.31', '_platform': 'linux', '_python': '3.9.7', '_require_service': None, '_runqueue_item_id': None, '_save_requirements': True, '_service_transport': None, '_start_datetime': datetime.datetime(2022, 3, 25, 19, 17, 2, 616686), '_start_time': 1648261022.616686, '_tmp_code_dir': '/home/mazen/Projects/pl_jdt/output/wandb/run-20220325_191702-1wdniopy/tmp/code', '_tracelog': None, '_unsaved_keys': None, '_windows': False, 'allow_val_change': None, 'anonymous': None, 'api_key': None, 'base_url': 'https://api.wandb.ai', 'code_dir': None, 'config_paths': None, 'console': 'auto', 'deployment': 'cloud', 'disable_code': None, 'disable_git': False, 'disabled': False, 'docker': None, 'email': 'mazen.ota@gmail.com', 'entity': None, 'files_dir': '/home/mazen/Projects/pl_jdt/output/wandb/run-20220325_191702-1wdniopy/files', 'force': None, 'git_remote': 'origin', 'heartbeat_seconds': 30, 'host': 'mazen-HP-Z640-Workstation', 'ignore_globs': (), 'is_local': False, 'label_disable': None, 'launch': None, 'launch_config_path': None, 'log_dir': '/home/mazen/Projects/pl_jdt/output/wandb/run-20220325_191702-1wdniopy/logs', 'log_internal': '/home/mazen/Projects/pl_jdt/output/wandb/run-20220325_191702-1wdniopy/logs/debug-internal.log', 'log_symlink_internal': '/home/mazen/Projects/pl_jdt/output/wandb/debug-internal.log', 'log_symlink_user': '/home/mazen/Projects/pl_jdt/output/wandb/debug.log', 'log_user': '/home/mazen/Projects/pl_jdt/output/wandb/run-20220325_191702-1wdniopy/logs/debug.log', 'login_timeout': None, 'magic': None, 'mode': 'online', 'notebook_name': None, 'problem': 'fatal', 'program': '/home/mazen/Projects/pl_jdt/scripts/train.py', 'program_relpath': 'scripts/train.py', 'project': 'mnist_training_test', 'project_url': '', 'quiet': None, 'reinit': None, 'relogin': None, 'resume': 'allow', 'resume_fname': '/home/mazen/Projects/pl_jdt/output/wandb/wandb-resume.json', 'resumed': False, 'root_dir': '/home/mazen/Projects/pl_jdt/output', 'run_group': None, 'run_id': '1wdniopy', 'run_job_type': None, 'run_mode': 'run', 'run_name': None, 'run_notes': None, 'run_tags': None, 'run_url': '', 'sagemaker_disable': None, 'save_code': True, 'settings_system': '/home/mazen/.config/wandb/settings', 'settings_workspace': '/home/mazen/Projects/pl_jdt/output/wandb/settings', 'show_colors': None, 'show_emoji': None, 'show_errors': True, 'show_info': True, 'show_warnings': True, 'silent': False, 'start_method': None, 'strict': None, 'summary_errors': None, 'summary_warnings': 5, 'sweep_id': None, 'sweep_param_path': None, 'sweep_url': '', 'symlink': True, 'sync_dir': '/home/mazen/Projects/pl_jdt/output/wandb/run-20220325_191702-1wdniopy', 'sync_file': '/home/mazen/Projects/pl_jdt/output/wandb/run-20220325_191702-1wdniopy/run-1wdniopy.wandb', 'sync_symlink_latest': '/home/mazen/Projects/pl_jdt/output/wandb/latest-run', 'system_sample': 15, 'system_sample_seconds': 2, 'timespec': '20220325_191702', 'tmp_dir': '/home/mazen/Projects/pl_jdt/output/wandb/run-20220325_191702-1wdniopy/tmp', 'username': 'mazen', 'wandb_dir': '/home/mazen/Projects/pl_jdt/output/wandb/', '_log_level': 10}, 'record_q': <multiprocessing.queues.Queue object at 0x7f478fceb430>, 'result_q': <multiprocessing.queues.Queue object at 0x7f478fcfb100>, 'user_pid': 4007964}, '_name': 'wandb_internal'} <_io.BytesIO object at 0x7f478f477e50>

Expected behavior

I am expecting for WandbLogger to work as the other loggers work as expected.

Environment

* CUDA:
        - GPU:
                - NVIDIA TITAN X (Pascal)
                - NVIDIA TITAN X (Pascal)
        - available:         True
        - version:           11.3
* Packages:
        - numpy:             1.21.2
        - pyTorch_debug:     False
        - pyTorch_version:   1.11.0
        - pytorch-lightning: 1.5.10
        - tqdm:              4.62.3
* System:
        - OS:                Linux
        - architecture:
                - 64bit
                - ELF
        - processor:         x86_64
        - python:            3.9.7
        - version:           #31~20.04.1-Ubuntu SMP Wed Jan 19 14:08:10 UTC 2022
  • PyTorch Lightning Version (e.g., 1.5.0): 1.5.10
  • PyTorch Version (e.g., 1.10): 1.11.0
  • Python version (e.g., 3.9): 3.9.7
  • OS (e.g., Linux): Linux (#31~20.04.1-Ubuntu SMP Wed Jan 19 14:08:10 UTC 2022)
  • CUDA/cuDNN version: 11.3
  • GPU models and configuration: NVIDIA TITAN X (Pascal)
  • How you installed PyTorch (conda, pip, source): conda
  • Any other relevant information: My script works on a single and DDP. I have been trying to have WandbLogger to work on both but I am getting the same error.

Additional context

I am refactoring my code so I have updated PyTorch to 1.11 and PyTorch Lightning to 1.5.10. I have started developing setup by step so I can ensure that everything is working as expected. I am using Hydra and Rich as the only external libraries (besides PyTorch, PyTorch Lightning, torchvision, and torchmetric).

cc @awaelchli @morganmcg1 @AyushExel @borisdayma @scottire @manangoel99

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
manangoel99commented, Mar 26, 2022

Awesome! Thanks for updating us on this.

0reactions
sudomazecommented, Mar 26, 2022

Same issue. I have removed the pip version and installed the conda version, but the same issue.

I have found out the problem. It seems that while debugging the DDP approach (as I was doing it manually). I have left the following:

print(process_obj.__dict__, fp); exit()

In python3.9/multiprocessing/popen_spawn_posix.py line 46 - (Popen/_launch).

This isn’t a bug in wandb nor in PyTorch Lightning but I have processed the bug while debugging DDP to build my custom version. It is interesting how other loggers nor the PL’s DDP have encountered this issue.

Now everything is working! 😄

I do apologize for the confusion. Thanks for your help @morganmcg1 @manangoel99 @akihironitta (I will close the issue).

Read more comments on GitHub >

github_iconTop Results From Across the Web

[CLI] Wandb crashes when trying to launch a pytorch-lightning ...
I get this error: Problem at: ... be frozen to produce an executable. wandb: ERROR Abnormal program exit Traceback (most recent call last): ......
Read more >
Programs randomly freeze-crashing with no errors in Windows ...
Programs randomly freeze-crashing with no errors in Windows 10. Hi. I've been keeping the latest updates for windows 10 and my drivers.
Read more >
When an application crash without output an error, is there a ...
Depends on the application. Different applications have different logging systems; there's no one central log that contains all the output from all the ......
Read more >
Enterprise Application crashes whi… | Apple Developer Forums
Hello Folks, Our enterprise application works fine in iOS 14, but when launched on iOS 15 it crashes instantly without giving any errors....
Read more >
My Program, Game, or Other Software is Closing to Desktop
This problem can be caused by a wide range of different issues, and with no specific error, it can be difficult to troubleshoot....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found