Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in handling select_related with limit and all() because of many2many relation

See original GitHub issue

When a model has a many2many relationship and you want to fetch all records including the related models, with a limit, the .all() function returns the wrong amount of records.

I debugged the issue until this part, where the instances are merged in modelproxy.py: merge_instances_list(result_rows).

When entering the merge_instances_list function, the result_rows includes all the records, but it seems that the query that it runs includes multiple rows of the same instance, e.g. for the many2many relation.

Now this is unexpected behavior, since I’d expect to get all the rows from the database, not the grouped instances of the record set.

Here’s a test setup for proving the issue:

from typing import List, Optional

import ormar
import pytest

from app.db.database import db, engine, metadata


class Keyword(ormar.Model):
    class Meta:
        metadata = metadata
        database = db
        tablename = "keywords"

    id: int = ormar.Integer(primary_key=True)
    name: str = ormar.String(max_length=50)


class KeywordPrimaryModel(ormar.Model):
    class Meta:
        metadata = metadata
        database = db
        tablename = "primary_models_keywords"

    id: int = ormar.Integer(primary_key=True)


class PrimaryModel(ormar.Model):
    class Meta:
        metadata = metadata
        database = db
        tablename = "primary_models"

    id: int = ormar.Integer(primary_key=True)
    name: str = ormar.String(max_length=255, index=True)
    some_text: str = ormar.Text()
    some_other_text: Optional[str] = ormar.Text(nullable=True)
    keywords: Optional[List[Keyword]] = ormar.ManyToMany(
        Keyword, through=KeywordPrimaryModel
    )


class SecondaryModel(ormar.Model):
    class Meta:
        metadata = metadata
        database = db
        tablename = "secondary_models"

    id: int = ormar.Integer(primary_key=True)
    name: str = ormar.String(max_length=100)
    primary_model: PrimaryModel = ormar.ForeignKey(
        PrimaryModel,
        related_name="secondary_models",
    )


@pytest.mark.asyncio
@pytest.mark.parametrize("tag_id", [1, 2, 3, 4, 5])
async def test_create_keywords(tag_id):
    await Keyword.objects.create(name=f"Tag {tag_id}")


@pytest.mark.asyncio
@pytest.mark.parametrize(
    "name, some_text, some_other_text",
    [
        ("Primary 1", "Some text 1", "Some other text 1"),
        ("Primary 2", "Some text 2", "Some other text 2"),
        ("Primary 3", "Some text 3", "Some other text 3"),
        ("Primary 4", "Some text 4", "Some other text 4"),
        ("Primary 5", "Some text 5", "Some other text 5"),
        ("Primary 6", "Some text 6", "Some other text 6"),
        ("Primary 7", "Some text 7", "Some other text 7"),
        ("Primary 8", "Some text 8", "Some other text 8"),
        ("Primary 9", "Some text 9", "Some other text 9"),
        ("Primary 10", "Some text 10", "Some other text 10"),
    ],
)
async def test_create_primary_models(name, some_text, some_other_text):
    await PrimaryModel(
        name=name, some_text=some_text, some_other_text=some_other_text
    ).save()


@pytest.mark.asyncio
async def test_add_keywords():
    p1 = await PrimaryModel.objects.get(pk=1)

    p2 = await PrimaryModel.objects.get(pk=2)

    for i in range(1, 6):
        keyword = await Keyword.objects.get(pk=i)
        if i % 2 == 0:
            await p1.keywords.add(keyword)
        else:
            await p2.keywords.add(keyword)


@pytest.mark.asyncio
async def test_create_secondary_model():
    secondary = await SecondaryModel(name="Foo", primary_model=1).save()
    assert secondary.id == 1
    assert secondary.primary_model.id == 1


@pytest.mark.asyncio
async def test_list_primary_models_with_keywords_and_limit():
    models = await PrimaryModel.objects.select_related("keywords").limit(5).all()

    # This test fails, because of the keywords relation.
    assert len(models) == 5


@pytest.mark.asyncio
async def test_list_primary_models_without_keywords_and_limit():
    models = await PrimaryModel.objects.all()
    assert len(models) == 10


@pytest.mark.asyncio
async def test_list_primary_models_without_keywords_but_with_limit():
    models = await PrimaryModel.objects.limit(5).all()
    assert len(models) == 5


@pytest.mark.asyncio
async def test_update_secondary():
    secondary = await SecondaryModel.objects.get(id=1)
    assert secondary.name == "Foo"
    await secondary.update(name="Updated")
    assert secondary.name == "Updated"


@pytest.fixture(autouse=True, scope="module")
def create_test_database():
    metadata.create_all(engine)
    yield
    metadata.drop_all(engine)

Here the test fails with len(models) being 2, not 5 as it should.

The grouping should probably happen in the query so that all records are returned.

Issue Analytics

State:
Created 3 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

soderlukcommented, Dec 14, 2020

@collerek Now that I’m really awake again, I’m not entirely sure how that would be done after all. You are correct, there must be some processing also on the Python side.

1reaction

collerekcommented, Dec 11, 2020

Ok, so this one of the reasons why i.e. django don’t allow select_related on M2M fields and reverse FK.

If you issue a select_related query a one (potentially huge) joined query is constructed and all typical SQL clauses are applied on the whole joined query -> so offset, limit, where etc. are applied at the end. Since it’s a join from multiple tables in raw sql response you will have duplicated values for parent, when you apply limit on this it applies on the SQL rows (that’s how sql limit works), that’s why you get first 5 rows of data meaning 2 first PrimaryModels as they have 5 children, so consumed all in a limit in raw sql rows.

Now in order to limit this to 5 rows of primary model I would have to know in advance either how many children the parent’s have (and children of children of children if it’s a multiple join query) or extract ids of those parents first. Both are possible, but require additional query against the database.

I could do it in python but not knowing any of the two in advance I would always have to fetch all data from join and limit number of parent models in the result list (wasting the rest of fetched data)

I don’t know if I will implement it cause it might be a huge effort or/and can slow down everything by quite a lot with that additional query and select_releted is specifically designed to be quick one db call query.

BUT - worry not 😃

That’s one of the reasons why prefetch_related was introduced. Yours solution is as simple as changing the select_releted to prefetch_related in your query and it will pass.

The reason is that prefetch_releted grabs the related models in consecutive queries after the initial one is completed. And limit/offset applies to the first query issued.

So it grabs 5 rows from primary model and then fetches the child models for only those 5 models already fetched. It should be better documented, that’s for sure 😃

Let me know if that solves your issue.

Top Results From Across the Web

Many2many-Relation-Error: Relation »_unknown« doesn't exist

I have a problem with a Many2many-Relation (Odoo 13). Let's assume people are spotting Nessie, which a researcher documents in a module.

Many to many associations - JavaLite

Often times the database-driven applications require many to many relationships. These are the kind where an entity can have many other entities and...

In a Django QuerySet, how to filter for "not exists" in a many-to ...

In reality each model has more fields which are of no consequence to this question. I want to filter all users who have...

Django Documentation - Read the Docs

Django supports all the common database relationships: ... We're using this instead of simply typing “python”, because manage.py sets the ...

Filtering and sorting data - ormar

So operations like filter() , select_related() , limit() and offset() etc. ... all relation types -> ForeignKey , reverse virtual FK and ManyToMany...