question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Export `iterator` optimizations bypass `prefetch_related` optimizations

See original GitHub issue

When using .iterator() prefetch_related is not taken in consideration (this is a django side effect of iterator()).

This is a sample of export method of Resource class:

        if isinstance(queryset, QuerySet):
            # Iterate without the queryset cache, to avoid wasting memory when
            # exporting large datasets.
            iterable = queryset.iterator()
        else:
            iterable = queryset
        for obj in iterable:
            data.append(self.export_resource(obj))

When queryset is an instance of QuerySet prefetch_related defined in get_queryset() override will be ignored.

Example (source of the code example):

class Book(models.Model):
    tags = TaggableManager()

class BookResource(resources.ModelResource):
    tags = fields.Field()

    class Meta:
        model = Book

    def dehydrate_tags(self, book):
        return ','.join([tag.name for tag in book.tags.all()])

    def get_queryset(self, queryset=None):
        if queryset is None:
            queryset = Book.objects.all()
        return queryset.prefetch_related('tags')

Since it is a QuerySet and the code of export will use iterator() it will result with 1 SQL query to fetch the books and 1 SQL query per row to fetch the tags.

The workaround I found was to list the queryset which will bypass the export iterator() logic:

    def export(self, queryset=None):
        queryset = self.get_queryset(queryset)
        fetched_queryset = list(queryset)
        return super().export(fetched_queryset)

With this fix I have now 2 SQL queries only

It works but it is not a clean solution.

Suggested fix:

  • In export validate if the queryset instance has prefetch_related and not use iterator in this situation
  • Or add some sort of configuration and documentation to avoid this kind of pointless and complicated override logic.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:12
  • Comments:16 (1 by maintainers)

github_iconTop GitHub Comments

12reactions
kingbuzzmancommented, Dec 20, 2018

@jrobichaud I have a working solution:

def page_queryset(queryset, per_page=2000):
    """
    While in a perfect world you would use queryset.iterator(), but there is an issue that it fails to
    load any prefetch_related() fields specified. Using Paginator() we can mimic the same functionality

    > Note that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two
    > optimizations do not make sense together.

    https://docs.djangoproject.com/en/2.0/ref/models/querysets/
    """
    if queryset._prefetch_related_lookups:
        if not queryset.query.order_by:
            # Paginator() throws a warning if there is no sorting attached to the queryset
            queryset = queryset.order_by('pk')
        paginator = Paginator(queryset, per_page)
        for index in range(paginator.num_pages):
            yield from paginator.get_page(index + 1)
    else:
        yield from queryset.iterator(chunk_size=per_page)

PS. I should state, I do not use django-import-export – I was just googling “django prefetch_related iterator” and this issue came up.

2reactions
Blejwicommented, Apr 8, 2019

Same issue here, it would be nice to have this option configurable, because in some cases exporting breaks sensible timeouts for HTTP requests.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How "Export to Excel" Almost Killed Our System
Evaluate the query and have the export function return a list instead of a queryset. django-import-export uses iterator to speed up the query....
Read more >
QuerySet.iterator together with prefetch_related because of ...
use of iterator() causes previous prefetch_related() calls to be ignored since these two optimizations do not make sense together.
Read more >
Resources — django-import-export 3.0.3.dev0 documentation
This is a useful optimization when importing large datasets. The default value is False. skip_unchanged = False¶. Controls if the import should skip...
Read more >
Django prefetch_related and performance optimisation ...
filter(created_date__lte=date).iterator():. Ok, so firstly, is there any way to optimise this? What may make some of the hardcore Django-ers ...
Read more >
django-import-export Documentation
Optimized loop. • Fixed properly skipping row marked as skipped when importing data from the admin interface. • Allow Resource.export to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found