Mac OS is crashing when debugger stops on a breakpoint
See original GitHub issueI’ve encountered an issue with multiple Mac laptops running the latest Catalina OS.
The scenario is running 2 processes with Feathers services & feathers-distributed and stopping the first process on a debug breakpoint. when it happens, the second process keeps sending hello messages on the channels. this is causing “socket stress” and leads to TCP Zero-Window issue. a short while after that, the OS is crashing and reboots (sort of Apple bug).
The solution that I’ve come with is to run the second process as a fork of the first process, send it heartbeat messages every second. when the fork detects the absent of the heartbeat messages for more than 2 seconds, it will stop all channels opened by feathers-distributed and will resume them when heartbeat is received again.
I’m wondering if this a known issue, because it is reproduced easily locally with Redis or broadcast, though the workaround applies only to Redis. with broadcast, the process will fail after resuming the channels due to closed socket.
This is the gist of stopping/resuming all channels:
const stopChannels = app => {
stop(app.serviceSubscriber);
stop(app.servicePublisher);
for (const service of Object.values(app.services)) {
stop(service.requester);
stop(service.responder);
stop(service.serviceEventsSubscriber);
stop(service.serviceEventsPublisher);
}
};
const startChannels = app => {
start(app.serviceSubscriber);
start(app.servicePublisher);
for (const service of Object.values(app.services)) {
start(service.requester);
start(service.responder);
start(service.serviceEventsSubscriber);
start(service.serviceEventsPublisher);
}
};
const stop = channel => {
if (channel)
channel.discovery.stop();
};
const start = channel => {
if (channel)
channel.discovery.start();
};
We can integrate this into the library, since the project should not manage the list of opened channels.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Closing in favor of https://github.com/kalisio/feathers-distributed/issues/48.
Thanks. It usually takes a minute or two to crash the OS with ~150 services (without Feathers events publishing) and for a developer, holding on a breakpoint for more than a minute is not a rare use-case.