Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Undefined behavior for using WAMP features not agreed during feature negotiation

See original GitHub issue

I have a client talking to a WAMP Basic Profile server (ran by a third party). This doesn’t support call cancellation so I think a call made like this:

# ...
    async def call_endpoint(self, endpoint):
        try:
            resp = await self.call(endpoint, timeout=ENDPOINT_CALL_TIMEOUT_SECONDS * 1000)
        except Exception as e:  # TODO: Tighten this exception
            print(f"Error calling {endpoint}: {e}")
            raise

would ignore the timeout option and carry on running past the timeout?

I didn’t actually notice the timeout option on the call method on my first pass so just wrapped it in asyncio.wait_for with a timeout like this:

# ...
    async def call_endpoint(self, endpoint):
        try:
            resp = await asyncio.wait_for(self.call(endpoint), timeout=ENDPOINT_CALL_TIMEOUT_SECONDS)
        except (asyncio.TimeoutError) as e:
            print(f"Timeout calling {endpoint}: {e}")
        except Exception as e:  # TODO: Tighten this exception
            print(f"Error calling {endpoint}: {e}")
            raise

But that’s throwing the occasional bad_protocol error. I didn’t know what the exact issue is but our third party gave me access to the server logs which showed a 49 error code relating to the cancel call and it does always align with being thrown on the call after a 20s timed out call.

What’s the best way to handle timeouts in this situation? I don’t want to catch the bad_protocol exception as is in case the client or server changes in a way that this hides another issue. Can I at least see why the bad_protocol exception is being thrown in the exception somehow? I also don’t want to allow the client to block indefinitely on the server because we previously saw consistent failures where the client just hangs forever waiting for a response from the server that never arrives.

Should Autobahn be detecting this lack of support for call cancellation and just silently dropping the task if cancelled in this way? Or throw another exception in some way that can be handled?

I noticed https://github.com/crossbario/autobahn-python/issues/1127 but while I think there’s maybe some similarity in the reply in that issue I don’t really see the similarity in the opening post.

Issue Analytics

State:
Created 9 months ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

KSDaemoncommented, Dec 13, 2022

O, thanks for description how it works under the hood

the problem is: the use of the specific subfeature CANCEL happens when the original CALL feature already is in use

IOW: the client cancel is a function on the deferred/future for the already issued call.

if we fail the client side of the call when the cancel fails, then what about the result we’ll receive later?

we could throw it away client side (but this needs more code, since we check for results for no-longer active calls and bail out).

we could also not fail the call but silently ignore the request to initiate canceling remotely …

so you argue for 1., right?

Well, I think both points are legitimate and just complement each other.

The client wants to cancel the call. It just issues cancel(). Maybe just wrapping it in try-catch. But what mean if cancel fails for client code? I think not much. After issuing cancel() client just doesn’t care about results anymore. And that should work transparently for the client independently of what router is in use, do other peers supports call canceling or not. That’s the nature of WAMP: one peer doesn’t know much and actually doesn’t care much about other peers: where they are running and how. And call canceling is one of those features that can be implemented in a progressive degradation manner. So even if other peers do not support it we should handle that on our side. So for client, the logic remains the same for feature-rich router or basic one.

What if the client issues call canceling, and we return an error as a feature not supported — okay. What’s then? Client still need to wait for the results? Or we will receive results sometime in the future and fail the whole connection because there is no related call? I think that’s not so convenient for end user and there is no reason to fail the whole connection.

0reactions

tomelliffcommented, Dec 13, 2022

I don’t know enough about WAMP/Autobahn to really weigh in here on your design discussion but I was curious about why I receive the error on the follow up call and not when the previous call is cancelled?

I’d expect to have something like this:

sequenceDiagram
    Client->>+Server: Call RPC A
    Note over Server: Server blocks
    Note over Client: Client times out the call of RPC A
    Client->>+Server: Cancel RPC A
    Server-->>+Client: Return bad protocol for unsupported cancellation
    Note over Client: Client throws bad protocol error

but it feels more like this:

sequenceDiagram
    Client->>+Server: Call RPC A
    Note over Server: Server blocks
    Note over Client: Client times out the call of RPC A
    Client->>+Server: Cancel RPC A
    Client->>+Server: Call RPC B
    Server-->>+Client: Return bad protocol for unsupported cancellation
    Note over Client: Client throws bad protocol error