Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A better way to get all friends than Cursor

See original GitHub issue

The typical way of getting all friends is by:

friends_list = []
for friend in tweepy.Cursor(API.friends).items():
    friends_list.append(friend)

However, this is limited by the rate 200/15min. It’s very time consumable if there are hundreds or even thousands of friends (followers, blockers, etc.).

While API.get_friends_ids() returns a list of 5000 ids of user, once at most, per 15 minutes (5000/15min), and API.get_user() has a limit of 900/15min.

The results of the combination of the two function above are exactly the same as which done by tweepy.Cursor, and much time saving.

Could this method be implemented to tweepy?

Issue Analytics

State:
Created 3 years ago
Reactions:4
Comments:9 (2 by maintainers)

Top GitHub Comments

1reaction

NetwarSystemcommented, May 22, 2022

I develop in Python and I’ve got some solution for this stuff.

There are scripts to get friends/followers. They get blocks of 5k, up to 14 times in 15 minutes. I leave one token free, saves me from grief if some other process wants to use one. When they get down to a single token, they sleep(60), then check API, until they can proceed again. The numeric IDs are placed in a named Redis queue. I used Walrus over the Redis queue so I’m dealing with native Python objects.

The Redis queue name corresponds to an Elasticsearch index and an ArangoDB graph. I’ve been slowly moving things from Elastic or Redis into ArangoDB or RabbitMQ, when that makes sense. I’ve got tokens for several dozen accounts, and they’re employed by a numeric ID to user object resolver, which then writes a minimized portion of the JSON blob to an Elasticsearch index. They are allowed 900 lookups/15 minutes, so I permit them only 800 per period.

A single account doing this can collect 280k followers in an hour. The resolvers run in parallel, so they’ll handle about 200k an hour. As an beneficial parallel process if you stream a set of accounts and pull the user objects out there, it’ll get you the active accounts w/o an API load.

The API, at least to me, seems to be biased for mobile developers who aren’t going to do thousands of operations. If you’re going bigger, you’re going to need to cache, to do things in parallel. A sunset of the 75k/15 minutes friends/followers and replacing it with 15k/15 minutes would be … a huge hassle for me. I periodically look for ways to start the retrieval of those relationships somewhere other than the beginning of the set - it’s really difficult to do large accounts with the 280k/hour cap I put on things.

1reaction

KumaTeacommented, Apr 7, 2021

Does it help a little if you provide the count parameter and provide the maximum supported amount of 200?

IE: tweepy.Cursor(API.friends, count=200).items()

But not doubt using API.friends_ids() is far better.

Hi! Thank you for your suggestion.

I just tried and the count worked, so by adding this the limit can be expand from the default 300/15min (not 200) to 3000/15min. That’s, on the contrary, much better than my suggested solution.

Thanks for sharing this!