Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow easy paging for list operations

See original GitHub issue

Taking the comment on https://github.com/GoogleCloudPlatform/gcloud-python/pull/889/files#r31042158 over here.

Requiring explicit paging for topics is really ugly, there must be a better way than:

all_topics = []

topics, token = client.list_topics()
all_topics.extend(topics)
while token is not None:
  topics, token = client.list_topics(page_token=token)
  all_topics.extend(topics)

If our concern is “people could make a boatload of requests and they don’t realize it”, we could always limit this like we do with the recursive deleting in storage?

topics = client.list_topics(limit=100)

We can do a bunch of this with the page_size parameter, but that might mean that I have to wait for everything to come back before starting any work, which seems kind of ridiculous.

It’d be really nice if that limit and page-size stuff was both there, so it’s easy to do things like “I want the first 5000 topics, and I want to pull them from the server in chunks of 50”:

for topic in client.list_topics(page_size=50, limit=5000):
  push_work_to_other_system(topic)

To add a bit more context, I’d like to toss out: what if we made all of our list operations return iterators?

The use cases I see here are…

I want everything, give it all to me (for topic in list_topics())
I want up to N things, stop giving them to me then (for topic in list_topics(limit=100))
I don’t know how many I want, I’ll know when I want to stop though… (for topics in list_topics(): if topic.name == 'foo': break)
Combination of the previous two (I don’t know when I want to stop, but don’t let me go on forever, kill it at say… 1000)
I want to pick up where I left off, I saved a token somewhere (sort of like offset)! (for topic in list_topics(page_token=token, limit=100))

The “let’s just always return page, page_token” thing doesn’t really make all of those use-cases all that fun… But if we always return iterators, they are all easy.

Further, let’s say I have a weird case where I just want one page worth of stuff… list_topics().get_current_page() could return what you want, no?

Issue Analytics

State:
Created 8 years ago
Comments:32 (32 by maintainers)

Top GitHub Comments

2reactions

rimeycommented, Apr 12, 2016

Here’s my input, for what it’s worth.

I agree that it is important to provide a simple interface that doesn’t require the user to deal with page tokens. If we don’t, people will often choose to fetch just the first page. I know I’ve done that.

Whether it is better to return a list or an iterator will depend on the API, as will whether it makes sense to offer a limit parameter.

Default limits are rarely going to be appropriate. A default limit on PubSub topics would surely be bad, for instance. Failing to override such a default with limit=None would almost always be an insidious bug.

If listing PubSub topics returns an iterator, I actually don’t think it makes sense to offer a limit parameter at all. I can’t think of a use-case for it. If the user really did want to retrieve some limited number of arbitrary topics, they could do that by terminating the iteration early.

1reaction

tseavercommented, Oct 3, 2016

My $0.02 on the strategy:

Rename the current paging methods, from list_foo -> list_foo_paged.
Add sibling list_foo methods which just return an iterator already wrapped around list_foo_paged. These methods do not carry params for all the possible iterator properties to list_foo: instead, users who care can configure them before beginning to iterate.