Missing checks for rmw handle in rclpy_create_publisher
See original GitHub issueRequired Info:
- Operating System: Ubuntu 20.04
- Installation type: Source install
- Version or commit hash: foxy
- DDS implementation: Fast-RTPS
- Client library (if applicable): rclpy
Feature request
Hi, this issue is in the gray area between bug report and feature request.
When an application creates a publisher through rclcpp
, it invokes the following rcl
APIs:
rcl_get_zero_initialized_publisher
to get a handle,rcl_publisher_init
to initialize the publisher,rcl_publisher_get_rmw_handle
followed by a NULL check to make sure there really isrmw_handle
.
However, the last check is missing in rclpy_create_publisher
of rclpy
. In other words, rclpy
may think it successfully created a publisher even when rmw_handle
is asynchronously set to NULL.
I did find cases when this becomes problematic. Consider an rclpy
application that creates a publisher and publishes messages. For example:
- After a call to
rcl_publisher_init
, if for any reason,publisher->impl->rmw_handle
becomes NULL, that error goes undetected back in thercl_create_publisher
. - Then,
rcl_crate_publisher
returnspublisher_capsule
back tonode.py
. - The returned capsule is used for creating a
rclpy.publisher.Publisher
instance. - During the initialization of the
Publisher
instance, aQoSEventHandler
object is created, which internally calls_rclpy.rclpy_create_event
(in rclpy qos_event.py) ->rcl_publisher_event_init
(in rclpy _rclpy_qos_event.c) ->rmw_publisher_event_init
(in rcl event.c) ->rmw_publisher_event_init
(rmw_implementation) ->rmw_publisher_event_init
(rmw_fastrtps). There, it segfaults when de-referncing a null pointer.
Core dump:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f7781a45090 in rmw_publisher_event_init () from /home/seulbae/workspace/ros2_foxy/install/rmw_fastrtps_cpp/lib/librmw_fastrtps_cpp.so
---
RAX 0x0
RBP 0x7ffdd13adb00 —▸ 0x7ffdd13adb40 —▸ 0x7ffdd13adbc0 —▸ 0x7ffdd13adc40 —▸ 0x7f778264e400 ◂— ...
RSP 0x7ffdd13adae0 —▸ 0x7ffdd13adb20 ◂— 0x0
RIP 0x7f7781a45090 (rmw_publisher_event_init+27) ◂— 0xf0458b4808488b48
---
► 0x7f7781a45090 <rmw_publisher_event_init+27> mov rcx, qword ptr [rax + 8]
---
pwndbg> bt
#0 0x00007f7781a45090 in rmw_publisher_event_init () from /home/seulbae/workspace/ros2_foxy/install/rmw_fastrtps_cpp/lib/librmw_fastrtps_cpp.so
#1 0x00007f7782034a62 in rmw_publisher_event_init () from /home/seulbae/workspace/ros2_foxy/install/rmw_implementation/lib/librmw_implementation.so
#2 0x00007f7782051a44 in rcl_publisher_event_init () from /home/seulbae/workspace/ros2_foxy/install/rcl/lib/librcl.so
#3 0x00007f7783dac5f7 in rclpy_create_event () from /home/seulbae/workspace/ros2_foxy/install/rclpy/lib/python3.8/site-packages/rclpy/_rclpy.cpython-38-x86_64-linux-gnu.so
The effect of a missing pointer is silently manifested at a location that is far away from the fault site, making the debugging tricky.
Such issue could’ve prevented by sanity-checking rcl_publisher_get_rmw_handle
like how rclcpp
does. The rclcpp
application does not suffer from the same issue, as the missing rmw handle is caught right away.
P.S. The documentation for rcl
states the following about rcl_publisher_get_rmw_handle
:
The returned handle is made invalid if the publisher is finalized or if rcl_shutdown() is called. The returned handle is not guaranteed to be valid for the life time of the publisher as it may be finalized and recreated itself. Therefore it is recommended to get the handle from the publisher using this function each time it is needed and avoid use of the handle concurrently with functions that might change it.
Any thoughts? Thanks!
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Sure thing. Will open a PR and let you know! Thanks.
I opened PR #851 for this.