question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wandb: Network error (SSLError), entering retry loop.

See original GitHub issue

Issue description

wandb: Network error (SSLError), entering retry loop. interferes with training. Screenshot 2022-12-18 at 21 17 56

Current behavior

The training still runs and I can see the metrics in wandb dashboard, wandb: Network error resolved after 0:06:24.504729, resuming normal operation. However, I think it really slows down the training as this occurs very frequently. From the wandb debug.log there is: Caused by SSLError(SSLError(1, '[SSL: KRB5_S_TKT_NYV] unexpected eof while reading (_ssl.c:1091)

wandb support said (April 2022):

happens as a result of either (1) Improper installation of SSL on your python distro as noted by some SO users here. I would recommend reinstalling Anaconda/your virtual environment and and upgrade openssl.

(But i don’t think I have permission to do so on the Neuropoly servers.)

Expected behavior

run without interruption.

Steps to reproduce

running normal training: ivadomed --train -c config_Mod3DUnet_ax.json --path-data ../data/ --path-output ../results/ with bavaria-quebec preprocessed data.

config file
{
    "command": "train",
    "gpu_ids": [0],
    "path_output": "../results/ax_output_run1",
    "model_name": "ModifiedUnet3d_singleContrast",
    "debugging": true,
    "object_detection_params": {
        "object_detection_path": null,
        "safety_factor": [1.0, 1.0, 1.0]
    },
    "wandb": {
        "wandb_api_key": "",
        "project_name": "bavaria",
        "group_name": "lesion_ax",
        "run_name": "ax_run1",
        "log_grads_every": 100
    },
    "loader_parameters": {
        "path_data": ["~/duke/temp/kiri/bavaria-preprocessed"],
        "subject_selection:": {"n": [], "metadata": [], "value": []},
        "target_suffix": ["_lesion-manual"],
        "extensions": [".nii.gz"],
        "roi_params": {
            "suffix": null,
            "slice_filter_roi": null
        },
        "contrast_params": {
            "training_validation": ["T2w"],
            "testing": ["T2w"],
            "balance": {}
        },
        "slice_filter_params": {
            "filter_empty_mask": false,
            "filter_empty_input": false
        },
        "slice_axis": "axial",
        "multichannel": false,
        "soft_gt": false
    },
    "split_dataset": {
        "fname_split": null,
        "random_seed": 42,
        "split_method" : "participant_id",
        "data_testing": {"data_type": null, "data_value":[]},
        "balance": null,
        "train_fraction": 0.6,
        "test_fraction": 0.2
    },
    "training_parameters": {
        "batch_size":    2,
	"loss": {
            "name": "DiceLoss"
        },
        "training_time": {
            "num_epochs": 100,
            "early_stopping_patience": 100,
            "early_stopping_epsilon": 0.001
        },
        "scheduler": {
            "initial_lr": 1e-3,
            "lr_scheduler": {
                "name": "CosineAnnealingLR",
                "base_lr": 1e-5,
                "max_lr": 1e-3
            }
        },
        "balance_samples": {"applied": false, "type": "gt"}
    },
    "default_model": {
        "name": "Unet",
        "dropout_rate": 0.3,
        "bn_momentum": 0.1,
        "final_activation": "sigmoid",
	"is_2d": false,
        "depth": 4
    },
    "Modified3DUNet": {
        "applied": true,
        "length_3D": [160, 160, 720],
        "stride_3D": [80, 80, 360],
        "attention": false,
        "n_filters": 3
    },
    "uncertainty": {
        "epistemic": false,
        "aleatoric": false,
        "n_it": 0
    },
    "postprocessing": {
        "binarize_prediction": {"thr": 0.5},
        "uncertainty": {"thr": -1, "suffix": "_unc-vox.nii.gz"}
    },
    "evaluation_parameters": {},
    "transformation": {
        "Resample": {
            "wspace": 0.5,
            "hspace": 0.5,
            "dspace": 1
        },
        "CenterCrop": {
            "size": [160, 160, 720]
	},
        "RandomAffine": {
            "degrees": 10,
            "scale": [0.3, 0.3, 0.3],
            "translate": [0.1, 0.1, 0.1],
            "applied_to": ["im", "gt"],
            "dataset_type": ["training"]
        },
        "ElasticTransform": {
			"alpha_range": [25.0, 35.0],
			"sigma_range":  [3.5, 4.5],
			"p": 0.5,
            "applied_to": ["im", "gt"],
            "dataset_type": ["training"]
        },
	"RandomReverse": {
	    "applied_to": ["im", "gt"],
	    "dataset_type": ["training"]
	},
	"RandomGamma": {
            "log_gamma_range": [-1.5, 1.5],
            "p": 0.5,
            "applied_to": ["im"],
            "dataset_type": ["training"]
        },
        "RandomBiasField": {
            "coefficients": 0.5,
            "order": 3,
            "p": 0.3,
            "applied_to": ["im"],
            "dataset_type": ["training"]
        },
        "RandomBlur": {
            "sigma_range": [0.0, 1.0],
            "p": 0.3,
            "applied_to": ["im"],
            "dataset_type": ["training"]
        },
        "NumpyToTensor": {},
        "NormalizeInstance": {"applied_to": ["im"]}
    }
}

-->

Environment

System description

NeuroPoly server, Rosenberg, Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-53-generic x86_64)

Installed packages

on branch mhb/1213-fix-3d-data-augmentation from PR 1222

Output of pip freeze
absl-py==1.1.0
astor==0.8.1
astunparse==1.6.3
awscli==1.22.34
beniget==0.4.1
bids-validator==1.9.9
botocore==1.23.34
brz-etckeeper==0.0.0
cachetools==5.2.0
certifi==2020.6.20
chardet==4.0.0
click==8.0.3
colorama==0.4.4
coloredlogs==15.0.1
command-not-found==0.3
commonmark==0.9.1
cryptography==3.4.8
csv-diff==1.1
cycler==0.11.0
dbus-python==1.2.18
decorator==4.4.2
Deprecated==1.2.13
dictdiffer==0.9.0
dill==0.3.5.1
distlib==0.3.4
distro==1.7.0
distro-info===1.1build1
dnspython==2.1.0
docker-pycreds==0.4.0
docopt==0.6.2
docutils==0.17.1
filelock==3.6.0
flatbuffers==2.0.7
fonttools==4.33.3
formulaic==0.3.4
fsleyes==1.5.0
fsleyes-props==1.8.2
fsleyes-widgets==0.12.3
fslpy==3.9.5
gast==0.4.0
gitdb==4.0.10
GitPython==3.1.29
google-auth==2.8.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
gpg===1.16.0-unknown
grpcio==1.47.0
h5py==3.7.0
humanfriendly==10.0
humanize==4.4.0
idna==3.3
imageio==2.22.4
importlib-metadata==4.6.4
interface-meta==1.3.0
iotop==0.6
-e git+https://github.com/ivadomed/ivadomed.git@d6385f1c57b7433a57003167c215f2288db3b631#egg=ivadomed
Jinja2==3.1.2
jmespath==0.10.0
joblib==1.2.0
keras==2.11.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.3
libclang==14.0.1
loguru==0.6.0
Markdown==3.3.6
MarkupSafe==2.1.1
matplotlib==3.5.2
more-itertools==8.10.0
mpmath==1.2.1
netifaces==0.11.0
networkx==2.8.8
nibabel==3.2.2
num2words==0.5.12
numpy==1.23.5
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauthlib==3.2.0
onnxruntime==1.13.1
opt-einsum==3.3.0
osfclient==0.0.5
packaging==21.3
pandas==1.4.4
pathtools==0.1.2
Pillow==9.0.1
platformdirs==2.5.1
ply==3.11
promise==2.3
protobuf==3.19.4
psutil==5.9.4
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybids==0.15.5
Pygments==2.11.2
PyGObject==3.42.1
PyOpenGL==3.1.6
pyparsing==2.4.7
python-apt==2.3.0+ubuntu2.1
python-dateutil==2.8.1
pythran==0.10.0
pytz==2022.6
PyWavelets==1.4.1
PyYAML==5.4.1
requests==2.25.1
requests-oauthlib==1.3.1
requests-toolbelt==0.9.1
rich==12.6.0
roman==3.3
rsa==4.8
s3transfer==0.5.0
scikit-image==0.19.3
scikit-learn==1.2.0
scipy==1.8.0
screen-resolution-extra==0.0.0
seaborn==0.12.1
sentry-sdk==1.11.1
setproctitle==1.3.2
shellingham==1.5.0
shortuuid==1.0.11
SimpleITK==2.2.1
six==1.16.0
smmap==5.0.0
SQLAlchemy==1.3.24
ssh-import-id==5.11
sympy==1.11.1
tensorboard==2.11.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.11.0
tensorflow-estimator==2.11.0
tensorflow-io-gcs-filesystem==0.26.0
termcolor==1.1.0
threadpoolctl==3.1.0
tifffile==2022.10.10
torch==1.11.0
torchaudio==0.13.0
torchio==0.18.86
torchvision==0.12.0
tqdm==4.64.0
typer==0.7.0
typing_extensions==4.2.0
ubuntu-drivers-common==0.0.0
ufw==0.36.1
unattended-upgrades==0.1
urllib3==1.26.13
virtualenv==20.13.0+ds
wandb==0.13.7
Werkzeug==2.1.2
wrapt==1.14.1
wxPython==4.0.7
xkit==0.0.0
zipp==1.0.0

Issue Analytics

  • State:open
  • Created 9 months ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
jcohenadadcommented, Dec 19, 2022

Still, if you want the live mode (which is useful), we need to figure out what is wrong in your config. I don’t think it’s a network issue because I’m using the same computer and I don’t experience this issue.

1reaction
kiristerncommented, Dec 19, 2022

Thanks for the suggestion @kanishk16, not getting the error message after setting ‘mode’ = ‘offline’

Read more comments on GitHub >

github_iconTop Results From Across the Web

[CLI]: wandb: Network error (SSLError), entering retry loop.
Describe the bug weird problem, just wandb.init() and I got Retry attempt ... [CLI]: wandb: Network error (SSLError), entering retry loop.
Read more >
Troubleshooting - Documentation - Weights & Biases - Wandb
If you're seeing SSL or network errors: wandb: Network error (ConnectionError), entering retry loop . You can try a couple of different approaches...
Read more >
Weights and Biases: Login and network errors - Stack Overflow
This error happens when I use the command: wandb login ... refused')) wandb: Network error (ConnectionError), entering retry loop.
Read more >
wandb: Network error (ConnectionError), entering retry loop.
How do I deal with network issues? 1. SSL certificate. 解决SSL certificate,我没有管理员权限,无法更改,而且更改完之后可能依然会报错?
Read more >
Weights and Biases blocked on the ETH Proxy Server
Weights and biases (wandb) is a web service used by many Euler user to have ... "wandb: Network error (TransientError), entering retry loop....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found