In-memory caching of of instances of user-defined classes does not preserve class identity
See original GitHub issueSummary
When streamlit reruns a file with a class definition this class gets newly instantiated in memory. A cached instance of this class however remains an instance of the old definition of this class. A newly created instance of it therefore ends up being of a different class than a cached instance of it which can lead to hard to debug bugs.
Steps to reproduce
- Run this code with
streamlit run
from enum import Enum
import streamlit as st
class A(Enum):
Var1 = 0
@st.cache
def get_enum_dict():
return {A.Var1: "Hi"}
look_up_key = A.Var1
cached_value = get_enum_dict()
st.write("class id of look_up_key: {}".format(id(look_up_key.__class__)))
st.write("class id of cached key: {}".format(id(list(cached_value.keys())[0].__class__)))
st.write(cached_value[look_up_key])
- Rerun by pressing ‘r’
Expected behavior:
Rerunning should print the same id for the class of look_up_key
and the key in cached_value
and the code should still print “Hi” at the end.
Actual behavior:
On the intial run the code print two times the same id and the look-up in the dictionary is successful.
But on rerun the class ids differ and a KeyError: <A.Var1: 0>
is raised.
Is this a regression?
no
Debug info
- Streamlit version: 0.71.0
- Python version: 3.8.3
- Using Conda
- OS version: Mac OS 10.15.7
- Browser version: Firefox 82.0.3 (64-Bit)
Additional information
This bug is not unique to Enums but happens with all user-defined classes that get reevaluated. I had the same problems with other classes but this example is more easily reducible.
Ideas on how to fix it
Pickling and unpickling the cached object causes the class id to be updated to the new definition.
A very helpful short-term band-aid would be to have a separate st.cache option that forces pickling and unpickling also for the in-memory cache. That way the user can circumevent that bug selectively for the problematic types.
Long term I have two ideas but do not know how feasible they are: Walk the object hierarchy of every cached value and
- apply in-memory pickling only selectively to classes which definitions are in files that might be rerun during a Session
- “Hot-Patch” the
__class__
field upon retrieval from the cache. But I do not know whether that is reliable in python or whether there are unintended side-effects to that.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:8
- Comments:11 (4 by maintainers)
Top GitHub Comments
Thank you so much for posting this. This was a very aggravating bug to track down. The stack trace would show that
enum
s which are supposed to be identical, were not. I was so confused and frustrated.This bug made it difficult for me to use
streamlit
with a mature code base that relied onenum
hashing for various data operations.Just want to add another voice to this. I’ve been bit by this as well, wanting to do branching based on
isinstance
. I also want to be able to useEnum
s in my code, but have had to give up on that.I want to be able to write library code, that is agnostic to the UI i put on top of it. This is the number one issue that stops me from doing that with Streamlit.
This issue is underrated.