[SIP-23] Move SQL Lab storage out of browser localStorage
See original GitHub issue[SIP] Proposal for moving SQL Lab storage out of browser localStorage
Motivation
Currently, we store SQL Lab state in the browser localStorage
, including tabs, their queries and results. The redux state is persisted to localStorage
using the redux-localstorage
library.
While the implementation is clean, it provides a few drawbacks:
- The state is not preserved across browsers, of if the user clears the application data in the browser.
- Upgrades might leave the state in a bad shape, preventing SQL Lab from working successfully. We observed this a few times at Lyft, and users would use incognito mode until we instructed them to delete the application data.
- Storage is limited to 5 MB, hardcoded on the browser.
At Lyft we’re currently migrating from BigQuery to Superset, and the project requires querying tables with nested fields (see the work on https://github.com/apache/incubator-superset/pull/7625, https://github.com/apache/incubator-superset/pull/7627 and https://github.com/apache/incubator-superset/pull/7693). Here’s what we see when querying the first 100 rows from one of our tables:
In this case, the query is running automatically for the data preview when the user selects the table in the table browser (left of SQL Lab) in order to inspect it. This makes the browser extremely sluggish, even crashing it.
Proposed Change
I propose moving the persistence of SQL Lab’s state from the browser localStorage
to the metadata database. State would be synced the following way:
backend -> frontend
- On load, the bootstrap payload contains a list of tab IDs, and the active tab ID.
- SQL Lab loads the active tab asynchronously by ID. This will load:
- selected database
- selected schema
- any table schemas
- query in textarea
- results (if query has run)
- any table previews
- On tab switch, the corresponding tab is loaded asynchronously in a similar way.
frontend -> backend
- Tabs are saved every time a query changes (user typing, eg), with debouncing.
- Tabs are saved every time a query is executed.
- Tabs are saved every time results are loaded.
- Similar for other changes (database changed, schema, table preview).
These changes should be transparent to the user, with the exception that there will be an additional latency from having to request the state from the server, instead of having it in localStorage
.
New or Changed Public Interfaces
For this work, we need to create the following models:
# superset/models/sql_lab.py
class TabState(Model):
__tablename__ = 'tab_state'
# basic info
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('ab_user.id'))
label = Column(String(256))
active = Column(Boolean, default=False)
# tables that are open in the schema browser and their data previews
table_schemas = relationship('TableSchema')
# the query in the textarea, and results (if any)
# we'll reuse the Query model, since it has everything we need
# (note that this require having results_key set even for sync queries)
query_id = Column(Integer, ForeignKey('query.id'))
query = relationship('Query')
class TableSchema(Model):
__tablename__ = 'table_schema'
id = Column(Integer, primary_key=True)
tab_state_id = Column(Integer, ForeignKey('tab_state.id'))
# DB
database_id = Column(Integer, ForeignKey('dbs.id'), nullable=False)
database = relationship('Database', foreign_keys=[database_id])
schema = Column(String(256))
table = Column(String(256))
# JSON describing the schema, partitions, latest partition, etc.
results = Column(Text)
These will be exposed via automatically generated views by FAB.
Note that since we’re loading the query results from the server, for synchronous queries we need to store the results in a results backend. We can fallback to a simple results backend (werkzeug.contrib.cache.BaseCache
, eg) when none is set.
Optionally, we can store results ourselves in the database with a simple model:
class Results(Model):
__tablename__ = 'results'
# used as results_key for sync queries
id = Column(Integer, primary_key=True)
# JSON with the results payload
results = Column(Text)
New dependencies
None.
Migration Plan and Compatibility
We should implement logic that moves the state from localStorage
to the backend when we detect that it’s stored in the browser.
Rejected Alternatives
None.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:5
- Comments:9 (8 by maintainers)
Top GitHub Comments
Issue-Label Bot is automatically applying the label
#enhancement
to this issue, with a confidence of 0.61. Please mark this comment with 👍 or 👎 to give our bot feedback!Links: app homepage, dashboard and code for this bot.
query
is just a lazy attribute, not a real column. Accessing it returns theQuery
object.