question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Subscription errors and does not reconnect when connection is cancelled by server

See original GitHub issue

After upgrading @google-cloud/pubsub to escape a cpu issue in @grpc/grpc-js, we found that services using a long-running pubsub subscription began to crash periodically during local development and in GKE.

Issue

When the pubsub service closes the underlying http2 connecton (this happens after 15 minutes), the connected subscription instance emits an error event and does not reconnect.

In a previous version of the library, the subscription instance would reconnect (and wouldn’t emit an error event). However, it looks like the CANCELLED status was removed from the RETRY_CODES list found in src/pull-retry.ts in this commit, which means we skip the the retry block and move on to destroy the stream with a StatusError in src/message-stream.ts here

Since the pubsub service reserves the right to cancel a connection at any time, I would expect this library to handle reconnecting when the pubsub service cancels said connection and not emit an error message on the subscription instance.

The simplest workaround I’ve found so far is to manually add the retry code for CANCELLED back into the RETRY_CODES list before requiring @google-cloud/pubsub proper:

const { RETRY_CODES } = require('@google-cloud/pubsub/build/src/pull-retry');
RETRY_CODES.push(1);

const { PubSub } = require('@google-cloud/pubsub');

Environment details

  • OS: GKE
  • Node.js version: 13.11.0
  • package versions:
    • @google-cloud/pubsub: 1.7.2
    • google-gax: 1.15.2
    • @grpc/grpc-js: 0.7.9

Steps to reproduce

The simplest way to reproduce this issue is to create a subscription to a topic using the pubsub-emulator and wait 15 minutes.

Example reproduction code:

const { PubSub } = require('@google-cloud/pubsub');

const options = {
  projectId: 'local-dev',
  apiEndpoint: 'localhost:8085',
};

const pubsub = new PubSub(options);

const topicName = 'test-topic';
const subscriptionName = 'test-topic-subscription';

const run = async () => {
  const topicDefinition = pubsub.topic(topicName);

  await topicDefinition.get({ autoCreate: true });

  const subscriptionDefinition = topicDefinition.subscription(subscriptionName);

  const [subscription] = await subscriptionDefinition.get({ autoCreate: true });

  subscription.on('message', (message) => {
    console.log('message received', { message });
  });

  subscription.on('error', (error) => {
    console.error(`Error in pubsub subscription: ${error.message}`, {
      error, topicName, subscriptionName,
    });
  });

  subscription.on('close', () => {
    console.error('Pubsub subscription closed', {
      topicName, subscriptionName,
    });
  });

  console.log('handlers added to subscription');
};

run();

Running the above script without sending any messages to the topic yielded this error after 15 minutes:

Error in pubsub subscription: Call cancelled {
  error: StatusError: Call cancelled
      at MessageStream._onEnd ([...]/Development/temp/testing-pubsub/node_modules/@google-cloud/pubsub/build/src/message-stream.js:234:26)
      at MessageStream._onStatus ([...]/Development/temp/testing-pubsub/node_modules/@google-cloud/pubsub/build/src/message-stream.js:271:18)
      at ClientDuplexStreamImpl.<anonymous> ([...]/Development/temp/testing-pubsub/node_modules/@google-cloud/pubsub/build/src/message-stream.js:143:44)
      at Object.onceWrapper (events.js:422:26)
      at ClientDuplexStreamImpl.emit (events.js:315:20)
      at Object.onReceiveStatus ([...]/Development/temp/testing-pubsub/node_modules/@grpc/grpc-js/build/src/client.js:391:24)
      at Object.onReceiveStatus ([...]/Development/temp/testing-pubsub/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:303:181)
      at Http2CallStream.outputStatus ([...]/Development/temp/testing-pubsub/node_modules/@grpc/grpc-js/build/src/call-stream.js:114:27)
      at Http2CallStream.maybeOutputStatus ([...]/Development/temp/testing-pubsub/node_modules/@grpc/grpc-js/build/src/call-stream.js:153:22)
      at Http2CallStream.endCall ([...]/Development/temp/testing-pubsub/node_modules/@grpc/grpc-js/build/src/call-stream.js:140:18) {
    code: 1,
    details: 'Call cancelled',
    metadata: Metadata { internalRepr: Map(0) {}, options: {} }
  },
  topicName: 'test-topic',
  subscriptionName: 'test-topic-subscription'
}

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:37 (9 by maintainers)

github_iconTop GitHub Comments

3reactions
feywindcommented, Oct 16, 2020

Hey, good(?) news!

2020-10-16T21:50:36.408Z | call_stream | [2] HTTP/2 stream closed with code 8
2020-10-16T21:50:36.408Z | call_stream | [2] ended with status: code=1 details="Call cancelled"
2020-10-16T21:50:36.409Z | subchannel | 127.0.0.1:8085 callRefcount 5 -> 4
STREAM unpipe 2020-10-16T21:50:36.409Z was running for 00:14:58.988

I only saw this with grpc-js, which is also an interesting data point. I’ll be bugging a grpc person about this probably.

2reactions
zombieleetcommented, Nov 20, 2020

We are currently experiencing this same issue using gcloud-sdk docker image. What’s the possible fix ? @feywind

Read more comments on GitHub >

github_iconTop Results From Across the Web

Remote Desktop client disconnects and can't reconnect to the ...
In this article. After Remote Desktop client loses its connection to the remote desktop, the client can't immediately reconnect.
Read more >
Error message 'Cannot update a cancelled Recurring Logic' is ...
Error message 'Cannot update a cancelled Recurring Logic' is seen when upgrading the satellite server with the disabling sync plans step.
Read more >
Fix problems with subscriptions - Android - Google Play Help
Find the subscription you want to update. Choose from the following: For active subscriptions, tap Manage. To fix your payment method, tap Update....
Read more >
Quickbooks error "You Cancelled communication with the ...
Open your company file again. · Go to the File menu, then select Utilities and choose Rebuild Data. · Click OK in the...
Read more >
Troubleshoot Subscriptions - Tableau Help
Where live database connections are concerned, Tableau Server doesn't have ... You may see the above error in Windows Event Viewer if subscriptions...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found