Efficient lookup in many-many relationship
See original GitHub issueI have the following schema, where the relationship between Executable and Symbol is many-to-many.
class File(db.Entity):
loc = Required(str, unique=True)
tim = Optional(datetime)
class Executable(File):
sym = Set("Symbol")
class Symbol(db.Entity):
sig = Required(str, 5000, encoding='utf-8')
exe = Set(Executable)
A foreign-key table called Executable_Symbol would be created by Pony to store this relationship, but there seems to be no way to check whether a particular relationship exists via the ORM unless I drop down to raw SQL, i.e.
eid,sid = exe.id,sym.id
db.select('* from Executable_Symbol where executable=$eid and symbol=$sid ')
I figured the best way of doing this is that if I have a Symbol called sym, and an Executable called exe, I can use the expression:
exe in sym.exe
But this seems to be very slow. In comparison, accessing the Executable_Symbol table using raw SQL is much faster, but dropping to raw SQL is not very desirable. My application would check this a few hundred thousand times, so every bit of efficiency would be useful.
Is there a better way to do this?
thanks!
Issue Analytics
- State:
- Created 10 years ago
- Comments:13 (7 by maintainers)
Hi @hfaran, if a
Set
attribute does not havelazy=True
option then Pony assumes that the collection size is probably not too big and it is more efficient to load all items together if some items were requested. Pony operates this way in order to reduce the total number of queries. For example, if you calculate something likecourse1 in student1.courses
and thecourses
collection is not lazy, then all items will be loaded. After thatcourse2 in student1.courses
will not lead to a new query and return answer immediately.If
lazy=True
option is specified, then Pony will load only the items which are necessary to perform the operation. This way Pony minimizes number of rows loaded instead of number of queries. Withlazy=True
option,course1 in student1.courses
will load a single row only. Some operations will still load full collection content, for example if you do loop over all items of collection:It will load all items in a single query. But you can restrict number of loaded items by using filtering:
or
This way only filtered items will be loaded. Also note that
len(c.courses)
loads all items whilec.courses.count()
just calculatesCOUNT(*)
in SQL.Hi @kozlovsky, I know this issue is now over 3 years old so I just wanted to clarify current Pony behaviour.
Let’s say I have a
Set
which I expect may grow to over several million entries. If I do not create theSet
attribute withlazy=True
, will Pony attempt to load all of those several million entities into the cache?