question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SIP-23] Move SQL Lab storage out of browser localStorage

See original GitHub issue

[SIP] Proposal for moving SQL Lab storage out of browser localStorage

Motivation

Currently, we store SQL Lab state in the browser localStorage, including tabs, their queries and results. The redux state is persisted to localStorage using the redux-localstorage library.

While the implementation is clean, it provides a few drawbacks:

  • The state is not preserved across browsers, of if the user clears the application data in the browser.
  • Upgrades might leave the state in a bad shape, preventing SQL Lab from working successfully. We observed this a few times at Lyft, and users would use incognito mode until we instructed them to delete the application data.
  • Storage is limited to 5 MB, hardcoded on the browser.

At Lyft we’re currently migrating from BigQuery to Superset, and the project requires querying tables with nested fields (see the work on https://github.com/apache/incubator-superset/pull/7625, https://github.com/apache/incubator-superset/pull/7627 and https://github.com/apache/incubator-superset/pull/7693). Here’s what we see when querying the first 100 rows from one of our tables:

Screen Shot 2019-06-19 at 1 23 01 PM

In this case, the query is running automatically for the data preview when the user selects the table in the table browser (left of SQL Lab) in order to inspect it. This makes the browser extremely sluggish, even crashing it.

Proposed Change

I propose moving the persistence of SQL Lab’s state from the browser localStorage to the metadata database. State would be synced the following way:

backend -> frontend

  • On load, the bootstrap payload contains a list of tab IDs, and the active tab ID.
  • SQL Lab loads the active tab asynchronously by ID. This will load:
    • selected database
    • selected schema
    • any table schemas
    • query in textarea
    • results (if query has run)
    • any table previews
  • On tab switch, the corresponding tab is loaded asynchronously in a similar way.

frontend -> backend

  • Tabs are saved every time a query changes (user typing, eg), with debouncing.
  • Tabs are saved every time a query is executed.
  • Tabs are saved every time results are loaded.
  • Similar for other changes (database changed, schema, table preview).

These changes should be transparent to the user, with the exception that there will be an additional latency from having to request the state from the server, instead of having it in localStorage.

New or Changed Public Interfaces

For this work, we need to create the following models:

# superset/models/sql_lab.py
class TabState(Model):

    __tablename__ = 'tab_state'

    # basic info
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('ab_user.id'))
    label = Column(String(256))
    active = Column(Boolean, default=False)

    # tables that are open in the schema browser and their data previews
    table_schemas = relationship('TableSchema')

    # the query in the textarea, and results (if any)
    # we'll reuse the Query model, since it has everything we need
    # (note that this require having results_key set even for sync queries)
    query_id = Column(Integer, ForeignKey('query.id'))
    query = relationship('Query')


class TableSchema(Model):

    __tablename__ = 'table_schema'

    id = Column(Integer, primary_key=True)
    tab_state_id = Column(Integer, ForeignKey('tab_state.id'))

    # DB
    database_id = Column(Integer, ForeignKey('dbs.id'), nullable=False)
    database = relationship('Database', foreign_keys=[database_id])
    schema = Column(String(256))
    table = Column(String(256))
  
    # JSON describing the schema, partitions, latest partition, etc.
    results = Column(Text)

These will be exposed via automatically generated views by FAB.

Note that since we’re loading the query results from the server, for synchronous queries we need to store the results in a results backend. We can fallback to a simple results backend (werkzeug.contrib.cache.BaseCache, eg) when none is set.

Optionally, we can store results ourselves in the database with a simple model:

class Results(Model):

    __tablename__ = 'results'

    # used as results_key for sync queries
    id = Column(Integer, primary_key=True)

    # JSON with the results payload
    results = Column(Text)

New dependencies

None.

Migration Plan and Compatibility

We should implement logic that moves the state from localStorage to the backend when we detect that it’s stored in the browser.

Rejected Alternatives

None.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:5
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

4reactions
issue-label-bot[bot]commented, Jun 21, 2019

Issue-Label Bot is automatically applying the label #enhancement to this issue, with a confidence of 0.61. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

0reactions
betodealmeidacommented, Jul 12, 2019

Why are both of query_id and query required in TabState?

query is just a lazy attribute, not a real column. Accessing it returns the Query object.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[SIP-23] Move SQL Lab storage out of browser localStorage
Currently, we store SQL Lab state in the browser localStorage , including tabs, their queries and results. The redux state is persisted to ......
Read more >
Local Storage, Session Storage, Cookie, IndexedDB ... - Medium
Key-value storage that stores values as strings · Does not have expiration date (persistent storage) unless explicitly clear the browser using ...
Read more >
Local Storage - Dive Into HTML5
Persistent local storage is one of the areas where native client applications have held an advantage over web applications. For native applications, the ......
Read more >
Everything you need to know about HTML5 local storage and ...
Local Storage data stored for a particular domain will be accessible even you open another browser window (Control + n or Command +...
Read more >
reddit-sysadmin-2018-06-07.txt - NWPS.fi
01:03 <+coderphive> to rip other people off 01:03 ... i work primarly with windows server but am trying to move to more linux...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found