CropForegroundd not support 'ddp_spwan' distributed strategy?
See original GitHub issue*Describe the bug When I use the following transform for distributed training,I got a error
train_transforms = Compose([LoadImaged(keys=["image", "label"]),
ScaleIntensityRanged(
keys=["image"], a_min=45, a_max=167,
b_min=0.0, b_max=1.0, clip=True,
),
AddChanneld(keys=["image", "label"]),
CropForegroundd(keys["image","label"],
source_key="label",select_fn=self.threshold_lager_one,margin=20),
Resized(keys=["image", "label"], spatial_size=[256,256,256],
mode=("trilinear", "nearest"), align_corners=(False, None)),
])
the erro is that Default process group has not been initialized, please make sure to call init_process_group
To Reproduce Here is my pytorch_lightning trainer setting
trainer = pytorch_lightning.Trainer(
gpus=[0,1],
stategy = 'ddp_spawn',
max_epochs=50,
logger=tb_logger,
checkpoint_callback=True,
num_sanity_val_steps=1,
check_val_every_n_epoch=5,
log_every_n_steps=1,
)
When I annotate CropForegroundd
, distributed training can work, strange
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
CropForegroundd does not adjust the affine accordingly
Describe the bug We are using the CropForegroundd transform. ... It can support to crop ND spatial (channel-first) data.
Read more >Distributed training with TensorFlow
Strategy is a TensorFlow API to distribute training across multiple GPUs, ... for debugging purposes and not supported for tf.distribute.
Read more >Transforms — MONAI 1.1.0 Documentation
If a dimension of the expected ROI size is larger than the input image size, will not crop that dimension. So the cropped...
Read more >Inside TensorFlow: tf.distribute.Strategy - YouTube
Take an inside look into the TensorFlow team's own internal training sessions--technical deep dives into TensorFlow by the very people who ...
Read more >Simplified distributed training with tf.distribute parameter servers
Learn about a new tf. distribute strategy, ParameterServerStrategy, which enables asynchronous distributed training in TensorFlow, along with ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
for 2d fake data
of course ! Thanks for your help! @Nic-Ma.