ZMQ messages not arriving at destination
See original GitHub issueHi Guys,
We have a very strange and critical problem that I need your help! The scenario is that we send a message in client application and for some reason, it didn’t arrive to the server application.
- We use version 0.4.3
- In the client application, we use DEALER socket (TCP)
- In the server application, we have a ROUTER socket and Reactor thread that catches the incoming events.
- In the client application we have one Zcontext and each thread that sends a message from the client creates a shadow socket, creates the DEALER socket, send the message to the server and waits 5 sec for ACK message from the server.
Part of the client code:
ZMsg msg = createMessage() ; msg.send(socket); socket.setReceiveTimeOut(5000); response = socket.recvStr();
In case that we didn’t get answer from the server, we assume that there is a network / load issues on the server. BUT in our case! we don’t see the message in the server side at all!
We have some traffic captures and I need your help in order to understand the problem. In a good scenario, we see that the client push data to server: C->S [SYN] S->C [SYN, ACK] C->S [ACK] C->S [PSH, ACK] S->C [ACK] S-C [PSH, ACK] C->S [ACK] C->S [PSH, ACK … In a bad scenario, which we don’t see traffic in the server and not getting ACK message in the client: C->S [SYN] S->C [SYN, ACK] C->S [ACK] S-C [PSH, ACK] C->S [ACK] C->S [RST, ACK] The reset from the client is after the timeout of 5 min… that we didn’t get ACK response. Then, we close the socket in the client with linger 0
Can you please help? What can be the reason that the messages don’t arrive to the server ?
Thanks! Kobi.
Issue Analytics
- State:
- Created 5 years ago
- Comments:24 (12 by maintainers)
ZeroMQ provides atomic message delivery AKA all or nothing. Unless it’s a bug in DEALER/ROUTER sockets, it feels like a symptom of a networking problem. I would ask these sorts of questions on the ZeroMQ mailing list, IRC, or StackOverflow.
Providing a minimal test case that recreates the issue goes a long way in helping to diagnose the underlying issue.
I think we’ve managed to reproduce this issue (the commit above).
From analyzing the traffic it seems that ZMQ handshake sometimes hangs in the middle. We can observe the same pattern as @kmualem:
Please notice that first 3 lines are part of TCP handshake, which seems to be completed successfully. Next 2 lines are part of ZMQ handshake. Please notice that it’s a client who does not send his part of the handshake.
It looks to me as a race condition. I don’t know implementation details, but I have a feeling that client queues the ZMQ handshake messages, but they never got sent (until some other event cause them to be sent/flushed, i.e. recv timeout?).
Two additional observations:
setHandshakeIvl
seems to help (unless at the same timesetImmediate(false)
will also be set)setImmediate(false)
considers connection to be completed before ZMQ handshake is completed (shouldn’t it be after?)