AssertionError in Mailbox.recv() causes context thread to stay running
See original GitHub issueI am running a complex unit test locally that involves 5 nodes with various PUB/SUB and REQ/RES pairs. They’re all talking via the public IP of the computer, but everything is on the same machine, in the same JVM.
Each endpoint has its own ZMQ.Context because it’s a simulation of nodes that could be put on different computers across a network.
I’m running multithreaded. Each thread has its own ZMQ.Context object, along with its own socket (in any of the modes listed above). At the end of the test, I try to shut everything down gracefully, but I get an intermittent AssertionError:
Exception in thread "MessageSubscriber: CharlieOne" java.lang.AssertionError
at zmq.Mailbox.recv(Mailbox.java:114)
at zmq.SocketBase.process_commands(SocketBase.java:793)
at zmq.SocketBase.recv(SocketBase.java:714)
at org.zeromq.ZMQ$Socket.recv(ZMQ.java:1247)
at org.zeromq.ZMQ$Socket.recv(ZMQ.java:1235)
at MessageSubscriber$ListenerThread.run(MessageSubscriber.java:131)
“MessageSubscriber: CharlieOne” is the test thread that contains a ZMQ subscriber. The code that causes the AssertionError is here.
Here is the code that runs in the class that owns that thread, to shut down the thread:
private ZMQ.Context context = ZMQ.context(1);
private ZMQ.Socket publisher = context.socket(ZMQ.PUB);
...
// Shut down the thread
listenerThread.run = false;
subscriber.close();
context.term();
try {
listenerThread.join();
} catch (InterruptedException ex) {
logger.warn("InterruptedException in thread.join()");
}
After the AssertionError occurs, context.term()
blocks indefinitely, perhaps because the socket never really closed. I’ve tried catching the AssertionError, but that doesn’t help. The term()
call still blocks indefinitely.
The problem doesn’t always happen. Sometimes the whole test runs, all the threads shut down, and the program exits. Most of the time, however, at least one of the threads blocks because of this assertion.
Issue Analytics
- State:
- Created 9 years ago
- Comments:7 (3 by maintainers)
It’s hard to say what’s wrong w/o seeing your code. But I’ve run into threads blocking during shutdown many times. This is the safest way I’ve found to shutdown zeromq sockets and the context:
2 threads: Main and SocketThread. Main creates the ZContext, creates SocketThread and passes it the context, and starts SocketThread. ZContext is thread safe so it doesn’t matter which thread creates it.
SocketThread uses the context to create some zeromq sockets and does work.
Later at shutdown time Main needs to signal SocketThread to shutdown. Do this however you want (set a volatile shutdown boolean, send it a ‘shutdown’ message on one of its zeromq sockets, whatever.) Once Main sends the shutdown signal it must join() on SocketThread and wait. SocketThread must close every socket it created and then exit.
Now Main can safely terminate the context.
In your case I’d try putting the context.term() in a finally block in the exception so it is called after join() returns. Be sure that no thread ever touches another thread’s sockets. And make sure every socket is closed and worker thread has exited before calling context.term().
HTH
It is unclear if this is still an issue in the latest version of JeroMQ. Closing for now – if anyone observes this problem on the latest version, please open a new issue.