question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`MultiThreadedExecutor:spin_until_future_complete` can block when the future is ready

See original GitHub issue

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • From source
  • Version or commit hash:
    • Foxy
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclpy

Steps to reproduce the issue

  • Publish a message from one node (with latching_qos)
  • In another node, subscribe to the topic with a callback that sets the result of the future
  • Start both nodes in a MuliThreadedExecutor
  • Spin until future complete
import rclpy
from rclpy.executors import MultiThreadedExecutor, SingleThreadedExecutor
from rclpy.node import Node
from std_msgs.msg import String
from rclpy.qos import QoSProfile, QoSDurabilityPolicy
from rclpy.task import Future
import os

latching_qos = QoSProfile(depth=1,
    durability=QoSDurabilityPolicy.RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL)

def main():

    rclpy.init()
    
    # Set up publisher
    pubnode = Node('pubnode_' + str(os.getpid()))
    pub1 = pubnode.create_publisher(String, 'topic1', latching_qos)
    msg1 = String()
    msg1.data = "hello1"
    pubnode.get_logger().info("Publishing hello1")
    pub1.publish(msg1)

    # Set up listener
    future_msgs = Future()
    subnode = Node('subnode_' + str(os.getpid()))
    subnode.create_subscription(String, 'topic1', lambda msg : ([
            subnode.get_logger().info("Received message on topic1"),
            future_msgs.set_result(msg)
    ]), latching_qos)

    # Start nodes
    exe = MultiThreadedExecutor()
    exe.add_node(pubnode)
    exe.add_node(subnode)

    future_msgs.add_done_callback(lambda fut : print("Future is done"))
    exe.spin_until_future_complete(future_msgs)

if __name__ == '__main__':
    main()

Expected behavior

The subnode should receive the message, set the future as complete, and then the program should exit.

Actual behavior

The subnode receives the message, sets the future as complete, but the exe.spin_until_future_complete(future_msgs) never returns.

Additional information

This only happens with the MultiThreadedExector. If I swap this out for the SingleThreadedExecutor then it works as expected.

If the pub node is running in a different process, it also works as expected.

I have also asked a question on ROS Answers here.

Here is a workspace that you can clone and run to immediately test this issue. Here is the code for the node defined in that workspace. It is very similar to the code posted in this bug report, but with a few argparse options.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:17 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
ivanpaunocommented, Jul 20, 2020

As @fujitatomoya explained here, there’s no deadlock. I’ve updated the PR title accordingly.

https://github.com/ros2/rclpy/pull/605 is a proposed fix. @craigh92 @fujitatomoya it would be great if you can confirm that the proposed fix solves the issue in the posted example, thanks!

1reaction
russkelcommented, Aug 18, 2021

sure @fujitatomoya I will do so.

Read more comments on GitHub >

github_iconTop Results From Across the Web

rclcpp::executors::MultiThreadedExecutor Class Reference
Spin (blocking) until the future is complete, it times out waiting, or rclcpp is interrupted. More... Cancel any running spin* function, causing it...
Read more >
MultiThreadedExecutor thread affinity for certain callbacks?
Inside the callback, somehow spawn a new job ('future'?) to only be run on a specific thread. Would it have to interact with...
Read more >
Deadlocks in rclpy and how to prevent them with ... - Karelics
The reason why this didn't work is that the future's done-callback – which is the critical callback getting blocked by the timer callback...
Read more >
concurrent.futures — Launching parallel tasks — Python 3.11 ...
The concurrent.futures module provides a high-level interface for ... the entire Python program will not exit until all pending futures are done executing....
Read more >
Async/Await | Writing an OS in Rust
The complete source code for this post can be found in the post-12 branch. ... One possible answer is to wait until a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found