Ros2 node list stops working after unrelated node crashes
See original GitHub issueBug report
Required Info:
- Operating System:
- Ubuntu 20.04
- Installation type:
- from source(master)
- Version or commit hash:
- 44a654c9077252eed50b2a6d9034ba47b634bf98(master)
- DDS implementation:
- rmw_fastrtps_cpp
- Client library (if applicable):
- N/A
Steps to reproduce issue
Unfortunately I don’t have detailed steps to reproduce, which makes it so hard to debug. The error might not happen for days and then turn up again constantly. I have various nodes which do things and somehow one of them must brick the whole ros system. I’m not sure how this is even possible, but after that happens only a complete reboot of the computer helps. If I try to use commands like ros2 node list
or ros2 service list
I get following stacktrace
Traceback (most recent call last):
File "/home/username/ros2_rolling/install/ros2cli/bin/ros2", line 11, in <module>
load_entry_point('ros2cli', 'console_scripts', 'ros2')()
File "/home/username/ros2_rolling/build/ros2cli/ros2cli/cli.py", line 89, in main
rc = extension.main(parser=parser, args=args)
File "/home/username/ros2_rolling/build/ros2node/ros2node/command/node.py", line 37, in main
return extension.main(args=args)
File "/home/username/ros2_rolling/build/ros2node/ros2node/verb/list.py", line 38, in main
node_names = get_node_names(node=node, include_hidden_nodes=args.all)
File "/home/username/ros2_rolling/build/ros2node/ros2node/api/__init__.py", line 60, in get_node_names
node_names_and_namespaces = node.get_node_names_and_namespaces()
File "/usr/lib/python3.8/xmlrpc/client.py", line 1109, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python3.8/xmlrpc/client.py", line 1450, in __request
response = self.__transport.request(
File "/usr/lib/python3.8/xmlrpc/client.py", line 1153, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python3.8/xmlrpc/client.py", line 1169, in single_request
return self.parse_response(resp)
File "/usr/lib/python3.8/xmlrpc/client.py", line 1341, in parse_response
return u.close()
File "/usr/lib/python3.8/xmlrpc/client.py", line 655, in close
raise Fault(**self._stack[0])
xmlrpc.client.Fault: <Fault 1: "<class 'rclpy._rclpy_pybind11.InvalidHandle'>:cannot use Destroyable because destruction was requested">
I have a node which seems to be the cause of this issue, but I’m not totally sure.
The node uses zeromq to interface with another system, and receive parameters and a state and it configures another node with those parameters and the state and monitors additionally if the node is online using ros2 node list
The node has following structure, some details like the zeromq stuff are left out, but I hope I left enough detail in to be useful:
from rclpy.node import Node
class Server(Node):
def __init__(self):
self.name = "my_name"
super().__init__(self.name + "_client")
def __enter__(self):
self.server = create_zermoq_server(callback=self.request_received)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.server.terminate()
def request_received(self, msg):
params, state = parse_msg(msg)
# I have another lifecycle node which this server controls which is named self.name
# thus the name of this node is self.name + "_client"
load_parameter_dict(node=self, node_name=self.name, parameter_dict=params)
call_change_states(node=self, transitions=state)
def monitor(pub, msg):
while coordinator.running:
nodes = subprocess.check_output("ros2 node list", shell=True).decode().split()
if "/name" in nodes:
_logger.info("Name is online")
msg.field = "online"
pub.publish(msg)
else:
_logger.info("Name is offline")
msg.field = "offline"
pub.publish(msg)
def stop(signum, frame):
rclpy.shutdown()
_logger.info("Signal received")
coordinator.running = False
def main():
signal.signal(signal.SIGTERM, stop)
rclpy.init()
msg = create_zeromq_msg()
pub = create_zeromq_pub()
pub.start()
threading.Thread(target=monitor, args=(pub, msg)).start()
with Server() as server:
_logger.info("Started server")
while coordinator.running:
_logger.debug("Running")
time.sleep(2)
_logger.info("Closed server")
if __name__ == '__main__':
main()
Expected behavior
I would expect even if a node somehow fails, to not take down the whole system. The fact that diagnostics like ros2 node list don’t work anymore makes this also hard to debug.
Actual behavior
I would expect ros2 node list
still to work.
Additional information
I tried following MRE, but it doesn’t cause the issue. I guess as intended, because the issues says it’s fixed.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (1 by maintainers)
Top GitHub Comments
Interesting information, I also had the idea of ros2 deamon, nice to see that it’s already implemented. I will try the --no-daemon option next time it happens and try to restart the daemon and see if that helps
A new kind of error started appearing:
with this traceback popping up shortly after in the log:
While I was writting this bug report my computer just completely froze up, which seems very odd. No mouse movement and no keyboard shortcuts were possible, I couldn’t even drop to the terminal. Something in ros’s lower levels seems to have caused a complete system halt.
I would really like to get behind what is causing these sporadic random crashes of this node, but I’m not sure where to start.