question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing checks for rmw handle in rclpy_create_publisher

See original GitHub issue

Required Info:

  • Operating System: Ubuntu 20.04
  • Installation type: Source install
  • Version or commit hash: foxy
  • DDS implementation: Fast-RTPS
  • Client library (if applicable): rclpy

Feature request

Hi, this issue is in the gray area between bug report and feature request.

When an application creates a publisher through rclcpp, it invokes the following rcl APIs:

  • rcl_get_zero_initialized_publisher to get a handle,
  • rcl_publisher_init to initialize the publisher,
  • rcl_publisher_get_rmw_handle followed by a NULL check to make sure there really is rmw_handle.

However, the last check is missing in rclpy_create_publisher of rclpy. In other words, rclpy may think it successfully created a publisher even when rmw_handle is asynchronously set to NULL.

I did find cases when this becomes problematic. Consider an rclpy application that creates a publisher and publishes messages. For example:

  1. After a call to rcl_publisher_init, if for any reason, publisher->impl->rmw_handle becomes NULL, that error goes undetected back in the rcl_create_publisher.
  2. Then, rcl_crate_publisher returns publisher_capsule back to node.py.
  3. The returned capsule is used for creating a rclpy.publisher.Publisher instance.
  4. During the initialization of the Publisher instance, a QoSEventHandler object is created, which internally calls _rclpy.rclpy_create_event (in rclpy qos_event.py) -> rcl_publisher_event_init (in rclpy _rclpy_qos_event.c) -> rmw_publisher_event_init (in rcl event.c) -> rmw_publisher_event_init (rmw_implementation) -> rmw_publisher_event_init (rmw_fastrtps). There, it segfaults when de-referncing a null pointer.

Core dump:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f7781a45090 in rmw_publisher_event_init () from /home/seulbae/workspace/ros2_foxy/install/rmw_fastrtps_cpp/lib/librmw_fastrtps_cpp.so 
---
RAX  0x0
RBP  0x7ffdd13adb00 —▸ 0x7ffdd13adb40 —▸ 0x7ffdd13adbc0 —▸ 0x7ffdd13adc40 —▸ 0x7f778264e400 ◂— ...
RSP  0x7ffdd13adae0 —▸ 0x7ffdd13adb20 ◂— 0x0
RIP  0x7f7781a45090 (rmw_publisher_event_init+27) ◂— 0xf0458b4808488b48
---
 ► 0x7f7781a45090 <rmw_publisher_event_init+27>    mov    rcx, qword ptr [rax + 8] 
---
pwndbg> bt                                                                                                                                                               
#0  0x00007f7781a45090 in rmw_publisher_event_init () from /home/seulbae/workspace/ros2_foxy/install/rmw_fastrtps_cpp/lib/librmw_fastrtps_cpp.so
#1  0x00007f7782034a62 in rmw_publisher_event_init () from /home/seulbae/workspace/ros2_foxy/install/rmw_implementation/lib/librmw_implementation.so
#2  0x00007f7782051a44 in rcl_publisher_event_init () from /home/seulbae/workspace/ros2_foxy/install/rcl/lib/librcl.so
#3  0x00007f7783dac5f7 in rclpy_create_event () from /home/seulbae/workspace/ros2_foxy/install/rclpy/lib/python3.8/site-packages/rclpy/_rclpy.cpython-38-x86_64-linux-gnu.so

The effect of a missing pointer is silently manifested at a location that is far away from the fault site, making the debugging tricky. Such issue could’ve prevented by sanity-checking rcl_publisher_get_rmw_handle like how rclcpp does. The rclcpp application does not suffer from the same issue, as the missing rmw handle is caught right away.

P.S. The documentation for rcl states the following about rcl_publisher_get_rmw_handle:

The returned handle is made invalid if the publisher is finalized or if rcl_shutdown() is called. The returned handle is not guaranteed to be valid for the life time of the publisher as it may be finalized and recreated itself. Therefore it is recommended to get the handle from the publisher using this function each time it is needed and avoid use of the handle concurrently with functions that might change it.

Any thoughts? Thanks!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
squizz617commented, Nov 8, 2021

Sure thing. Will open a PR and let you know! Thanks.

0reactions
squizz617commented, Nov 18, 2021

I opened PR #851 for this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

rclpy repository - ROS Index
#826, Missing checks for rmw handle in rclpy_create_publisher. #805, Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed on MacOS.
Read more >
Topics — rclpy 0.6.1 documentation - ROS 2 Docs
A publisher is used as a primary means of communication in a ROS system by publishing messages on a ROS topic. Parameters. publisher_handle...
Read more >
RoboFuzz: Fuzzing Robotic Systems over Robot Operating ...
rclpy - Concurrency Issue Due to Missing NULL Check (Bug. #27). When writing a user-space API for creating a publisher, it.
Read more >
ROS2 Python Publisher Example - The Robotics Back-End
Learn how to create, write, install and test your ROS2 Python publisher node. ... this sequence, check out how to write a minimal...
Read more >
doc/design_ros_2.md · master · ros-tracing / ros2_tracing
Design document for the general ROS 2 instrumentation, tracing, ... node's rmw_node_t handle and the publisher's rcl_publisher_t handle. rmw ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found