question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

error in disrtribued fifo scheduler

See original GitHub issue

If I run the distributed fifo scheduler across multiple instances I get the following error

autogluon.protocol.core - CRITICAL - Failed to deserialize 
Traceback (most recent call last): 
File "/usr/local/lib/python3.6/site-packages/distributed/protocol/core.py", line 106, in loads header = msgpack.loads(header, use_list=False, **msgpack_opts)
File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb
ValueError: tuple is not allowed for map key
autogluon.utils_comm - INFO - Got an unexpected error while collecting from workers: tuple is not allowed for map key

It seem to be related to this issue Is there a reason why we fix distributed==2.6.0 ?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
aaronklcommented, Mar 11, 2020

ok it’s seems the problem went away with latest changes in the master branch

1reaction
aaronklcommented, Mar 5, 2020

the following code snippet should reproduce the error:

import autogluon as ag
import numpy as np

dist_ip_addrs = # add list of worker IP addresses here

@ag.args(p=ag.Real(0, 1),
         epochs=10,)
def train_fn(args, reporter, **kwargs):

   for e in range(args.epochs):
        epoch = e + 1
        r = np.random.binomial(1, p=args.p)

        reporter(epoch=epoch, result=r)

scheduler = ag.scheduler.FIFOScheduler(train_fn,
                                       dist_ip_addrs=dist_ip_addrs,
                                       resource={'num_cpus': 2, 'num_gpus': 0},
                                       num_trials=100,
                                       reward_attr='result',
                                       time_attr='epoch')
scheduler.run()
scheduler.join_jobs()

the versions of the relevant dependencies are:

asn1crypto 0.24.0 attrs 19.3.0 awscli 1.18.13 boto3 1.9.176 botocore 1.12.176 catboost 0.22 ConfigSpace 0.4.10 Cython 0.29.13 dask 2.6.0 decorator 4.4.2 distributed 2.6.0 docutils 0.14 gluoncv 0.6.0 gluonnlp 0.8.1 graphviz 0.8.4 mxnet 1.6.0 mxnet-mkl 1.4.1 numpy 1.18.1 onnx 1.4.1 terminado 0.8.3

let me know if you require further information

Read more comments on GitHub >

github_iconTop Results From Across the Web

FIFO priority not being respected with delayed functions
I'm chaining together a bunch of delayed functions and then submitting them to the scheduler in one call via a loop, when calling ......
Read more >
2206 Error Message: LabVIEW:RT FIFO does not exist
Error -2206 is caused by attempting to call a queue or FIFO that doesn't exist, either because it has not been created, or...
Read more >
Task Scheduling in Distributed Systems
Heuristic means that solutions are found by using qualified guesses or trail and error [10]. Task scheduling exists both as static and dynamic...
Read more >
Evaluation of a Distributed Task Scheduler's Design
Evaluate the proposed task scheduler system based on our requirements. ... This lesson helped us to evaluate the issues with the FIFO queue....
Read more >
Examples of FIFO and Fair Sharing Policies
Both FIFO (First-In, First-Out) and Fair scheduling policies work differently in batch jobs and ad hoc jobs.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found