question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Distributed tests fail

See original GitHub issue

Recently while working on ignite.contrib.metrics.regression tests, I was trying to add distributed tests to some metrics that didn’t have distributed tests. While i was running those tests when world_size > 1 the result computed by the metric implemented in ignite differs from the one computed by sklearn(if imported from sklearn) / numpy. And here are examples for that:

  • Test wave hedges distance _test_distrib_integration() fails in case of world_size > 1 and if i didn’t comment all_gather() in _test_distrib_compute() it fails as well!

  • Test fractional absolute error The same failures in the same two tests _test_distrib_compute() and _test_distrib_integration().

related: #1284

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
vfdev-5commented, Mar 14, 2021

@sdesrozis I think we can leave this task to @KickItLikeShika .

@KickItLikeShika check here: https://pytorch.org/ignite/metrics.html#metrics-and-distributed-computations and here : https://pytorch.org/ignite/distributed.html#ignite-distributed-utils for the list of collective ops. Please, try to figure out how it works by your own and ask if you tried everything and you become blocked for sometime. Thanks !

1reaction
sdesroziscommented, Mar 14, 2021

Ah ah take a look, the metrics are not distributed. It means a reduction is needed in the implementation. I could fix it tomorrow.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting - Distributed Load Testing on AWS
Issue: You are using an existing VPC and your tests fail with a status of Failed , resulting in the following error message:...
Read more >
Distributed tests fail with pytest · Issue #41337 - GitHub
I'm getting test failures in the distributed tests. The failing tests are: DistributedDataParallelTest.test_accumulate_gradients_module ...
Read more >
End-to-End Testing: Avoiding Software Failure in Distributed ...
End-to-end (E2E) testing is a technique which verifies that all the features of a system and its interconnected subsystems work as expected.
Read more >
On Eliminating Error in Distributed Software Systems
It is impossible to rely on testing to eliminate errors. A test cannot show an absence of error. A test simply proves a...
Read more >
The plan failed the 401(k) ADP and ACP nondiscrimination tests
Excess contributions result from plans failing to satisfy the ADP test and should be distributed to the applicable HCEs within 12 months following...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found