question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Minion did not return. [No response] with some random minions

See original GitHub issue

Description With masters started since few weeks, salt commands failed with “Minion did not return. [No response]” on some random minions. The minions who do not respond are not the same if the salt command is rerun. I reproduce the issue when I target a single minion with a test.ping command. Restarting the master service solve the issue, but it appear again after some days/weeks without restarting the service.

Setup One master and few minions registred.

Steps to Reproduce the behavior It’s not easy to reproduced the issue, the master service must be started since several days/weeks, and it appears randomly. I reproduced the issue with a single minion on a test.ping command.

  • Salt command line create the test.ping job
  • Minion execute the job and send the return to the master
  • Salt command line doesn’t receive the response and create a find_job job
  • Minion response to the find_job with an empty response because the initial job is already executed
  • Salt command ends with “Minion did not return. [No response]”
  • At the same time, the Master event bus doesn’t display the response of the test.ping command
  • A jobs.lookup_jid correctly retrieve the response sent by the minion

Some responses seems to be dropped by the master event bus.

master logs

# salt -l debug myserver test.ping
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/master', u'myserver.dsone.3ds.com_master', u'tcp://127.0.0.1:4506', u'clear')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://127.0.0.1:4506
[DEBUG   ] Trying to connect to: tcp://127.0.0.1:4506
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/20/1aa1dda811f2bdb606bb78bba4ff9f3da4f8ad23a7da18d121a25ee34fb5b7/.minions.p
[DEBUG   ] get_iter_returns for jid 20200618175516640328 sent to set(['myserver']) will timeout at 17:55:21.656635
[DEBUG   ] Checking whether jid 20200618175516640328 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/master', u'myserver.dsone.3ds.com_master', u'tcp://127.0.0.1:4506', u'clear')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://127.0.0.1:4506
[DEBUG   ] Trying to connect to: tcp://127.0.0.1:4506
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] Passing on saltutil error. Key 'u'retcode' missing from client return. This may be an error in the client.
[DEBUG   ] return event: {'myserver': {u'failed': True}}
myserver:
    Minion did not return. [No response]
[DEBUG   ] Closing IPCMessageSubscriber instance
ERROR: Minions returned with non-zero exit code

master event bus

# salt-run state.event pretty=True
20200618175516640328    {
    "_stamp": "2020-06-18T15:55:16.640731",
    "minions": [
        "myserver"
    ]
}
salt/job/20200618175516640328/new       {
    "_stamp": "2020-06-18T15:55:16.641871",
    "arg": [],
    "fun": "test.ping",
    "jid": "20200618175516640328",
    "minions": [
        "myserver"
    ],
    "missing": [],
    "tgt": "myserver",
    "tgt_type": "glob",
    "user": "root"
}
20200618175521762170    {
    "_stamp": "2020-06-18T15:55:21.762531",
    "minions": [
        "myserver"
    ]
}
salt/job/20200618175521762170/new       {
    "_stamp": "2020-06-18T15:55:21.763894",
    "arg": [
        "20200618175516640328"
    ],
    "fun": "saltutil.find_job",
    "jid": "20200618175521762170",
    "minions": [
        "myserver"
    ],
    "missing": [],
    "tgt": [
        "myserver"
    ],
    "tgt_type": "list",
    "user": "root"
}
salt/job/20200618175521762170/ret/myserver   {
    "_stamp": "2020-06-18T15:55:21.861260",
    "cmd": "_return",
    "fun": "saltutil.find_job",
    "fun_args": [
        "20200618175516640328"
    ],
    "id": "myserver",
    "jid": "20200618175521762170",
    "master_id": "myserver",
    "retcode": 0,
    "return": {},
    "success": true
}

job lookup

[root@myserver ~]# salt-run -l info jobs.lookup_jid 20200618175516640328
myserver:
    True
[INFO    ] Runner completed: 20200619091453151012

minion logs

2020-06-18 17:55:16,652 [salt.minion      :1482][INFO    ][21477] User root Executing command test.ping with jid 20200618175516640328
2020-06-18 17:55:16,653 [salt.minion      :1489][DEBUG   ][21477] Command details {u'tgt_type': u'glob', u'jid': u'20200618175516640328', u'tgt': u'myserver', u'ret': u'', u'user': u'root', u'arg': [], u'fun': u'test.ping', u'master_id': u'myserver'}
2020-06-18 17:55:16,657 [salt.utils.process:860 ][DEBUG   ][21477] Subprocess SignalHandlingMultiprocessingProcess-1:8-Job-20200618175516640328 added
2020-06-18 17:55:16,716 [salt.utils.lazy  :104 ][DEBUG   ][22892] LazyLoaded jinja.render
2020-06-18 17:55:16,719 [salt.utils.lazy  :104 ][DEBUG   ][22892] LazyLoaded yaml.render
2020-06-18 17:55:16,721 [salt.minion      :1609][INFO    ][22892] Starting a new job 20200618175516640328 with PID 22892
2020-06-18 17:55:16,724 [salt.utils.lazy  :107 ][DEBUG   ][22892] Could not LazyLoad {0}.allow_missing_func: '{0}.allow_missing_func' is not available.
2020-06-18 17:55:16,742 [salt.utils.lazy  :104 ][DEBUG   ][22892] LazyLoaded test.ping
2020-06-18 17:55:16,743 [salt.loaded.int.module.test:124 ][DEBUG   ][22892] test.ping received for minion 'myserver'
2020-06-18 17:55:16,743 [salt.minion      :807 ][DEBUG   ][22892] Minion return retry timer set to 10 seconds (randomized)
2020-06-18 17:55:16,744 [salt.minion      :1937][INFO    ][22892] Returning information for job: 20200618175516640328
2020-06-18 17:55:16,745 [salt.transport.zeromq:138 ][DEBUG   ][22892] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/minion', u'myserver', u'tcp://10.81.105.213:4506', u'aes')
2020-06-18 17:55:16,746 [salt.crypt       :464 ][DEBUG   ][22892] Initializing new AsyncAuth for (u'/etc/salt/pki/minion', u'myserver', u'tcp://10.81.105.213:4506')
2020-06-18 17:55:16,747 [salt.transport.zeromq:209 ][DEBUG   ][22892] Connecting the Minion to the Master URI (for the return server): tcp://10.81.105.213:4506
2020-06-18 17:55:16,748 [salt.transport.zeromq:1189][DEBUG   ][22892] Trying to connect to: tcp://10.81.105.213:4506
2020-06-18 17:55:16,756 [salt.transport.zeromq:233 ][DEBUG   ][22892] Closing AsyncZeroMQReqChannel instance
2020-06-18 17:55:16,758 [salt.minion      :1787][DEBUG   ][22892] minion return: {u'fun_args': [], u'jid': u'20200618175516640328', u'return': True, u'retcode': 0, u'success': True, u'fun': u'test.ping', u'master_id': u'myserver'}
2020-06-18 17:55:17,717 [salt.utils.process:869 ][DEBUG   ][21477] Subprocess SignalHandlingMultiprocessingProcess-1:8-Job-20200618175516640328 cleaned up
2020-06-18 17:55:21,775 [salt.minion      :1482][INFO    ][21477] User root Executing command saltutil.find_job with jid 20200618175521762170
2020-06-18 17:55:21,776 [salt.minion      :1489][DEBUG   ][21477] Command details {u'tgt_type': u'list', u'jid': u'20200618175521762170', u'tgt': [u'myserver'], u'ret': u'', u'user': u'root', u'arg': [u'20200618175516640328'], u'fun': u'saltutil.find_job', u'master_id': u'myserver'}
2020-06-18 17:55:21,779 [salt.utils.process:860 ][DEBUG   ][21477] Subprocess SignalHandlingMultiprocessingProcess-1:9-Job-20200618175521762170 added
2020-06-18 17:55:21,838 [salt.utils.lazy  :104 ][DEBUG   ][22904] LazyLoaded jinja.render
2020-06-18 17:55:21,841 [salt.utils.lazy  :104 ][DEBUG   ][22904] LazyLoaded yaml.render
2020-06-18 17:55:21,844 [salt.minion      :1609][INFO    ][22904] Starting a new job 20200618175521762170 with PID 22904
2020-06-18 17:55:21,847 [salt.utils.lazy  :107 ][DEBUG   ][22904] Could not LazyLoad {0}.allow_missing_func: '{0}.allow_missing_func' is not available.
2020-06-18 17:55:21,850 [salt.utils.lazy  :104 ][DEBUG   ][22904] LazyLoaded saltutil.find_job
2020-06-18 17:55:21,852 [salt.minion      :807 ][DEBUG   ][22904] Minion return retry timer set to 6 seconds (randomized)
2020-06-18 17:55:21,852 [salt.minion      :1937][INFO    ][22904] Returning information for job: 20200618175521762170
2020-06-18 17:55:21,853 [salt.transport.zeromq:138 ][DEBUG   ][22904] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/minion', u'myserver', u'tcp://10.81.105.213:4506', u'aes')
2020-06-18 17:55:21,854 [salt.crypt       :464 ][DEBUG   ][22904] Initializing new AsyncAuth for (u'/etc/salt/pki/minion', u'myserver', u'tcp://10.81.105.213:4506')
2020-06-18 17:55:21,856 [salt.transport.zeromq:209 ][DEBUG   ][22904] Connecting the Minion to the Master URI (for the return server): tcp://10.81.105.213:4506
2020-06-18 17:55:21,857 [salt.transport.zeromq:1189][DEBUG   ][22904] Trying to connect to: tcp://10.81.105.213:4506
2020-06-18 17:55:21,865 [salt.transport.zeromq:233 ][DEBUG   ][22904] Closing AsyncZeroMQReqChannel instance
2020-06-18 17:55:21,866 [salt.minion      :1787][DEBUG   ][22904] minion return: {u'fun_args': [u'20200618175516640328'], u'jid': u'20200618175521762170', u'return': {}, u'retcode': 0, u'success': True, u'fun': u'saltutil.find_job', u'master_id': u'myserver'}
2020-06-18 17:55:22,717 [salt.utils.process:869 ][DEBUG   ][21477] Subprocess SignalHandlingMultiprocessingProcess-1:9-Job-20200618175521762170 cleaned up

Expected behavior Responses sent by minions must be returned by the command line.

Versions Report

salt --versions-report
Salt Version:
           Salt: 2019.2.4

Dependency Versions:
           cffi: 1.6.0
       cherrypy: Not Installed
       dateutil: 1.5
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: 0.26.3
        libnacl: Not Installed
       M2Crypto: 0.21.1
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: 2.19
       pycrypto: 2.6.1
   pycryptodome: 3.9.7
         pygit2: 0.26.4
         Python: 2.7.5 (default, Jun 11 2019, 14:33:56)
   python-gnupg: 0.4.4
         PyYAML: 3.10
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: redhat 7.5 Maipo
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-1127.8.2.el7.x86_64
         system: Linux
        version: Red Hat Enterprise Linux Server 7.5 Maipo

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:35 (18 by maintainers)

github_iconTop GitHub Comments

2reactions
vskubrievcommented, Dec 11, 2020

I can confirm that we also see it in our 3002.2. Hope next release should fix it.

1reaction
keslermcommented, Nov 3, 2021

We are seeing the same thing with our windows minions that exist in somewhat higher latency environments. We have minions in Australia and europe that have connection issues with the master in us-east-1.

Any of our windows minions where the latency stays <50ms do not need the tuning.

The linux minions do not seem to suffer from the same issue and stay connected.

Aggressively tuning the tcp_keepalive settings on the windows minions seems to stabilize these minions

tcp_keepalive: True
tcp_keepalive_idle: 10
tcp_keepalive_cnt: 3
tcp_keepalive_intvl: 10
Read more comments on GitHub >

github_iconTop Results From Across the Web

Salt minion returns no response after being accepted
(no response). I run salt-minion as root user and systemctl start salt-minion , it works. I don't know if it is a bug....
Read more >
Minion did not return. [No response] - RobViT
Time needed: 10 minutes. There are multiple reasons why your minion did not return. Hereby a couple checks you can do for troubleshooting ......
Read more >
Minions not able to connect back to master when losing ...
- After a while [No response] gets back to [Not connected] until you fire an event that gets through as if nothing happened....
Read more >
Options - Salt Project Documentation
Use the static option to only return the data with a hard timeout and after all minions have returned. Without the static option,...
Read more >
Salt: Minion did not return but salt-call is working - Server Fault
The problem isn't that the salt client (run on the master) is not waiting long enough, it's that the response the minion returns...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found