Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"KeyError: None" when using JOIN and aggregate_rows in 2.10.2

See original GitHub issue

There is a case where using joins and aggregate_rows is causing uncaught exceptions. The example seems to be related to joining tables through a junction table, where there are two tables A and C and another B that joins the two together. B has two rows, both pointing to the same A. The two B records have one with a C object and the other with a null C. Similar to this:

    B - C
   / 
A 
   \ 
    B - NULL

When attempting to join and aggregate the rows, we see the following stack trace:

Error
Traceback (most recent call last):
  File "/Users/ericely/workspace/test/test_peewee.py", line 48, in test_peewee
    print len(records)
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 3298, in __len__
    return len(self.execute())
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2334, in __len__
    return self.count
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2330, in count
    self.fill_cache()
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2377, in fill_cache
    next(self)
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2363, in next
    obj = self.iterate()
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2761, in iterate
    instance._data[metadata.foreign_key.name]]
KeyError: None

The code to generate this is:

import unittest
from peewee import *

db = SqliteDatabase(':memory:')

# create a base model class that our application's models will extend
class BaseModel(Model):
    class Meta:
        database = db

class A(BaseModel):
    val = FloatField(default=3.14)

class C(BaseModel):
    val = IntegerField(default=42)

class B(BaseModel):
    a = ForeignKeyField(A, null=True, default=None)
    c = ForeignKeyField(C, null=True, default=None)

class Osha_Violation_Model_Tests(unittest.TestCase):

    def test_peewee(self):
        A.create_table()
        B.create_table()
        C.create_table()

        # save the first record chain
        A().save(force_insert=True)
        a = A.get(A.id == 1)

        C().save(force_insert=True)
        c = C.get(C.id == 1)

        B(a=a, c=c).save(force_insert=True)

        # save the second record chain, starting with the same A, without a C link
        a = A.get(A.id == 1)

        B(a=a).save(force_insert=True)

        records = A\
            .select(A, B, C)\
            .join(B, JOIN_LEFT_OUTER)\
            .join(C, JOIN_LEFT_OUTER)\
            .aggregate_rows()

        print len(records)

This is peewee 2.10.2. Even though 2.10.2 isn’t the latest release, we are still using it since 3.X has some breaking backward changes. I know aggregate_rows was removed in 3.X and prefetch is now recommended, but we still have a relatively large codebase on 2.10.2.

Issue Analytics

State:
Created 5 years ago
Comments:13 (7 by maintainers)

Top GitHub Comments

1reaction

eely22commented, May 16, 2018

For anyone who stumbles upon this, here is how we ultimately solved the issue:

a_s = (A
     .select()
     .join(B, JOIN.LEFT_OUTER)
     .join(C, JOIN.LEFT_OUTER)
     .order_by(C.val))
     .group_by(A.id)
all_together = prefetch(a_s, B, C)

Allows us to order A by joined tables, but we dedupe by grouping on A’s primary key. Then we prefetch the other values through subqueries. So no duplicate A objects, and all other fields are prefetched.

Performance can still be a concern as you are now joining multiple tables, grouping, and then running subqueries on top of that, so your mileage may vary depending on your situation.

0reactions

coleifercommented, May 16, 2018

peewee is still returning me duplicate A objects

Exactly. All aggregate_rows() did was to roll up the duplicates and accumulate joined rows as related instances. That’s why I suggested using something like itertools.groupby.