question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"KeyError: None" when using JOIN and aggregate_rows in 2.10.2

See original GitHub issue

There is a case where using joins and aggregate_rows is causing uncaught exceptions. The example seems to be related to joining tables through a junction table, where there are two tables A and C and another B that joins the two together. B has two rows, both pointing to the same A. The two B records have one with a C object and the other with a null C. Similar to this:

    B - C
   / 
A 
   \ 
    B - NULL

When attempting to join and aggregate the rows, we see the following stack trace:

Error
Traceback (most recent call last):
  File "/Users/ericely/workspace/test/test_peewee.py", line 48, in test_peewee
    print len(records)
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 3298, in __len__
    return len(self.execute())
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2334, in __len__
    return self.count
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2330, in count
    self.fill_cache()
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2377, in fill_cache
    next(self)
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2363, in next
    obj = self.iterate()
  File "/Users/ericely/workspace/env/lib/python2.7/site-packages/peewee.py", line 2761, in iterate
    instance._data[metadata.foreign_key.name]]
KeyError: None

The code to generate this is:

import unittest
from peewee import *

db = SqliteDatabase(':memory:')

# create a base model class that our application's models will extend
class BaseModel(Model):
    class Meta:
        database = db

class A(BaseModel):
    val = FloatField(default=3.14)

class C(BaseModel):
    val = IntegerField(default=42)

class B(BaseModel):
    a = ForeignKeyField(A, null=True, default=None)
    c = ForeignKeyField(C, null=True, default=None)

class Osha_Violation_Model_Tests(unittest.TestCase):

    def test_peewee(self):
        A.create_table()
        B.create_table()
        C.create_table()

        # save the first record chain
        A().save(force_insert=True)
        a = A.get(A.id == 1)

        C().save(force_insert=True)
        c = C.get(C.id == 1)

        B(a=a, c=c).save(force_insert=True)

        # save the second record chain, starting with the same A, without a C link
        a = A.get(A.id == 1)

        B(a=a).save(force_insert=True)

        records = A\
            .select(A, B, C)\
            .join(B, JOIN_LEFT_OUTER)\
            .join(C, JOIN_LEFT_OUTER)\
            .aggregate_rows()

        print len(records)

This is peewee 2.10.2. Even though 2.10.2 isn’t the latest release, we are still using it since 3.X has some breaking backward changes. I know aggregate_rows was removed in 3.X and prefetch is now recommended, but we still have a relatively large codebase on 2.10.2.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
eely22commented, May 16, 2018

For anyone who stumbles upon this, here is how we ultimately solved the issue:

a_s = (A
     .select()
     .join(B, JOIN.LEFT_OUTER)
     .join(C, JOIN.LEFT_OUTER)
     .order_by(C.val))
     .group_by(A.id)
all_together = prefetch(a_s, B, C)

Allows us to order A by joined tables, but we dedupe by grouping on A’s primary key. Then we prefetch the other values through subqueries. So no duplicate A objects, and all other fields are prefetched.

Performance can still be a concern as you are now joining multiple tables, grouping, and then running subqueries on top of that, so your mileage may vary depending on your situation.

0reactions
coleifercommented, May 16, 2018

peewee is still returning me duplicate A objects

Exactly. All aggregate_rows() did was to roll up the duplicates and accumulate joined rows as related instances. That’s why I suggested using something like itertools.groupby.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Keyerror:None ,I don't understand this problem - Stack Overflow
In the line displayed you have self.side_map[side]) and KeyError: None means that the key is None, so your side variable have a value...
Read more >
API Reference — peewee 2.10.2 documentation
Method to look at an aggregate of rows using a given function and return a scalar value, such as the count of all...
Read more >
MySQL bugs fixed by Aurora MySQL database engine updates
The query includes a left join and an IN subquery. (Bug #34060289). Fixed an issue where it wasn't possible to revoke the DROP...
Read more >
KeyError Pandas – How To Fix - Data Independent
Pandas KeyError - This annoying error means that Pandas can not find your column name in your dataframe. Here's how to fix this...
Read more >
An Introduction to Using SQL Aggregate Functions with JOINs
Let's see how they cooperate paired with LEFT JOIN, SUM and GROUP BY ... COUNT(column), Counts the number of non-null values in a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found