question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Publishing large data is 30x-100x slower than for rclcpp

See original GitHub issue

TL;DR: Publishing a 10MB Pointcloud takes only 2.8 ms in rclypp but 92 ms in rclpy.

We observed significant latency problems when publishing large data like images or pointclouds via rclpy. Investigating the problem, we found that publishing large data of any kind is 30-100x slower in Python than in C++.

We found the Python Publisher.publish(msg) call takes especially long compared to rclcpp. Publishing a 640x480 Pointcloud of size ~10MB with 15 FPS is not possible with rclpy even on our quite powerful machine (Ryzen 3900X), let alone our robot (Jetson Xavier NX). The time taken for both seems to be about linear with the data size.

Reproduction: https://github.com/karl-schulz/ros2_latency

Bug report

  • Operating System:
    • Ubuntu 20.04 and Ubuntu 18.04
  • Installation type:
    • Both binaries and built from source
  • Version or commit hash:
    • (like in current binaries / rclpy foxy branch)
  • DDS implementation:
    • Both Fast-RTPS and CycloneDDS

Steps to reproduce issue

I created a package for easy reproduction: https://github.com/karl-schulz/ros2_latency

Basically, it consists of:

  • A node source publishing a random pointcloud of X MB size, X is controllable over a parameter
  • Two repeaters for C++ and Python, respectively, which
    • Copy the pointcloud
    • Update the timestamp
    • Re-publish it on different topics
  • A node measure which subscribes to both topics and compares the timestamps with the current ROS time

Expected behavior

  • rclpy and rclcpp being close in performance, with slightly higher latency for rclpy
  • Both rclpy and rclcpp being able topublish a Pointcloud of a 640x480 RGB-D image with at least 30 FPS on a fast machine

Actual behavior

  • Even on a very fast desktop machine, rclpy takes ~100ms for publishing 10MB of pointcloud data
  • For comparison, rclcpp take only between 1/30 and 1/100 of the time
  • Using rclpy with high FPS is not possible

Additional information

Tested on:

  • A desktop PC with a Ryzen 3800X and Ubuntu 20
  • Another machine with a Rytzen 3700X and Ubuntu 18 with ROS2 Foxy built from source
  • A NVIDIA Jetson Xavier NX L4T (Ubuntu 18) and ROS2 Foxy built from source

Tried Solutions

We were pretty desperate and already tried:

  • Switching the RWM from FastRTPS to CycloneDDS
    (however the good performance on rclcpp indicates that it is not a middleware problem)
  • Tuning CycloneDDS with a custom config XML and a higher system package size, like described here:
    https://index.ros.org/doc/ros2/Troubleshooting/DDS-tuning/ (but the same rationale applies)
  • Using PYTHONOPTIMIZE=0
  • Different QOS settings: “best effort”/“reliable”, depth=1/depth=10
  • Publishing different message types, e.g. Image messages instead of Pointloud2

This problem is actually really critical for our application. In the worst case: Is this expected behavior? This would make the ROS2 Python API useless for high-framerate or low-latency image pipelines.

Help would be really appreciated!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:4
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
sloretzcommented, Apr 9, 2021

Using callgrind after building with RelWithDebInfo seems to say the problem is in sensor_msgs__msg__point_cloud2__convert_from_py where it calls PyLong_AsUnsignedLong a bunch of times, which is almost certainly where it’s converting from the data field. Maybe there’s a faster way to do this if the Python type is already an array?

@hidmic you called the location 😃

Template for generated code https://github.com/ros2/rosidl_python/blob/5b9fe9cad6876e877cbbcf17950c47a9e753c6e6/rosidl_generator_py/resource/_msg_support.c.em#L158

Generated code in file build/sensor_msgs/rosidl_generator_py/sensor_msgs/msg/_point_cloud2_s.c

Generated code for above function
ROSIDL_GENERATOR_C_EXPORT
bool sensor_msgs__msg__point_cloud2__convert_from_py(PyObject * _pymsg, void * _ros_message)
{
  // check that the passed message is of the expected Python class
  {
    char full_classname_dest[42];
    {
      char * class_name = NULL;
      char * module_name = NULL;
      {
        PyObject * class_attr = PyObject_GetAttrString(_pymsg, "__class__");
        if (class_attr) {
          PyObject * name_attr = PyObject_GetAttrString(class_attr, "__name__");
          if (name_attr) {
            class_name = (char *)PyUnicode_1BYTE_DATA(name_attr);
            Py_DECREF(name_attr);
          }
          PyObject * module_attr = PyObject_GetAttrString(class_attr, "__module__");
          if (module_attr) {
            module_name = (char *)PyUnicode_1BYTE_DATA(module_attr);
            Py_DECREF(module_attr);
          }
          Py_DECREF(class_attr);
        }
      }
      if (!class_name || !module_name) {
        return false;
      }
      snprintf(full_classname_dest, sizeof(full_classname_dest), "%s.%s", module_name, class_name);
    }
    assert(strncmp("sensor_msgs.msg._point_cloud2.PointCloud2", full_classname_dest, 41) == 0);
  }
  sensor_msgs__msg__PointCloud2 * ros_message = _ros_message;
  {  // header
    PyObject * field = PyObject_GetAttrString(_pymsg, "header");
    if (!field) {
      return false;
    }
    if (!std_msgs__msg__header__convert_from_py(field, &ros_message->header)) {
      Py_DECREF(field);
      return false;
    }
    Py_DECREF(field);
  }
  {  // height
    PyObject * field = PyObject_GetAttrString(_pymsg, "height");
    if (!field) {
      return false;
    }
    assert(PyLong_Check(field));
    ros_message->height = PyLong_AsUnsignedLong(field);
    Py_DECREF(field);
  }
  {  // width
    PyObject * field = PyObject_GetAttrString(_pymsg, "width");
    if (!field) {
      return false;
    }
    assert(PyLong_Check(field));
    ros_message->width = PyLong_AsUnsignedLong(field);
    Py_DECREF(field);
  }
  {  // fields
    PyObject * field = PyObject_GetAttrString(_pymsg, "fields");
    if (!field) {
      return false;
    }
    PyObject * seq_field = PySequence_Fast(field, "expected a sequence in 'fields'");
    if (!seq_field) {
      Py_DECREF(field);
      return false;
    }
    Py_ssize_t size = PySequence_Size(field);
    if (-1 == size) {
      Py_DECREF(seq_field);
      Py_DECREF(field);
      return false;
    }
    if (!sensor_msgs__msg__PointField__Sequence__init(&(ros_message->fields), size)) {
      PyErr_SetString(PyExc_RuntimeError, "unable to create sensor_msgs__msg__PointField__Sequence ros_message");
      Py_DECREF(seq_field);
      Py_DECREF(field);
      return false;
    }
    sensor_msgs__msg__PointField * dest = ros_message->fields.data;
    for (Py_ssize_t i = 0; i < size; ++i) {
      if (!sensor_msgs__msg__point_field__convert_from_py(PySequence_Fast_GET_ITEM(seq_field, i), &dest[i])) {
        Py_DECREF(seq_field);
        Py_DECREF(field);
        return false;
      }
    }
    Py_DECREF(seq_field);
    Py_DECREF(field);
  }
  {  // is_bigendian
    PyObject * field = PyObject_GetAttrString(_pymsg, "is_bigendian");
    if (!field) {
      return false;
    }
    assert(PyBool_Check(field));
    ros_message->is_bigendian = (Py_True == field);
    Py_DECREF(field);
  }
  {  // point_step
    PyObject * field = PyObject_GetAttrString(_pymsg, "point_step");
    if (!field) {
      return false;
    }
    assert(PyLong_Check(field));
    ros_message->point_step = PyLong_AsUnsignedLong(field);
    Py_DECREF(field);
  }
  {  // row_step
    PyObject * field = PyObject_GetAttrString(_pymsg, "row_step");
    if (!field) {
      return false;
    }
    assert(PyLong_Check(field));
    ros_message->row_step = PyLong_AsUnsignedLong(field);
    Py_DECREF(field);
  }
  {  // data
    PyObject * field = PyObject_GetAttrString(_pymsg, "data");
    if (!field) {
      return false;
    }
    PyObject * seq_field = PySequence_Fast(field, "expected a sequence in 'data'");
    if (!seq_field) {
      Py_DECREF(field);
      return false;
    }
    Py_ssize_t size = PySequence_Size(field);
    if (-1 == size) {
      Py_DECREF(seq_field);
      Py_DECREF(field);
      return false;
    }
    if (!rosidl_runtime_c__uint8__Sequence__init(&(ros_message->data), size)) {
      PyErr_SetString(PyExc_RuntimeError, "unable to create uint8__Sequence ros_message");
      Py_DECREF(seq_field);
      Py_DECREF(field);
      return false;
    }
    uint8_t * dest = ros_message->data.data;
    for (Py_ssize_t i = 0; i < size; ++i) {
      PyObject * item = PySequence_Fast_GET_ITEM(seq_field, i);
      if (!item) {
        Py_DECREF(seq_field);
        Py_DECREF(field);
        return false;
      }
      assert(PyLong_Check(item));
      uint8_t tmp = (uint8_t)PyLong_AsUnsignedLong(item);

      memcpy(&dest[i], &tmp, sizeof(uint8_t));
    }
    Py_DECREF(seq_field);
    Py_DECREF(field);
  }
  {  // is_dense
    PyObject * field = PyObject_GetAttrString(_pymsg, "is_dense");
    if (!field) {
      return false;
    }
    assert(PyBool_Check(field));
    ros_message->is_dense = (Py_True == field);
    Py_DECREF(field);
  }

  return true;
}
0reactions
sloretzcommented, Jun 23, 2021

I think this can be closed now that ros2/rosidl_python#129 has been merged!

Read more comments on GitHub >

github_iconTop Results From Across the Web

ROS2 Performance: rclpy is 30x-100x slower than rclcpp
Investigating the problem, we found that publishing large data of any kind is 30-100x slower in Python than in C++.
Read more >
rclcpp: CPP Client Library Overview - ROS 2 Docs
These files are used to generate C++ code and data structures which are used for publishing and when receiving from a subscription.
Read more >
Low frame rate when publishing image messages in ros2
I can think of a couple reasons why the reported frequency from ros2 topic hz is reporting a lower frequency than expected.
Read more >
Create a Publisher and Subscriber in C++ | ROS 2 Foxy Fitzroy
Publisher nodes publish data to topic(s), subscriber nodes receive data from topic(s), and a publishing subscriber node can do both receive data ......
Read more >
Klepsydra ROS 2 Executor: A ring-buffer to rule them all
As the number of publishers increases and the data rate gets faster, the difference in CPU consumption becomes considerably large. At a certain ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found