[sqllab] How can we make large Superset queries load faster?
See original GitHub issueI’m looking into the possibility of making large superset queries load faster. And I want members of the community to share ideas here. Many times, users run a long query for a slice and get stuck waiting for a long time to get tens of thousands of rows they don’t intend to see. Before users can see the query the whole query has to run and there usually is a round trip to s3. This takes a really long time.
For inspiration, the Presto/Hive CLI returns almost immediately because it uses something like the less
bash command to load results immediately there are some rows.
There is a way to know if any data has been loaded in handle_cursor : (https://github.com/apache/incubator-superset/blob/31a995714df49e55ff69474378845fd8af605d4b/superset/db_engine_specs.py#L617)
The most basic idea is to make every query 2 queries. One query with a small limit (100?) and a View more button / loading icon so users don’t wrongly assume that’s all the results, while the actual full query keeps running.
I think we can do better than this starting idea. In particular, we shouldn’t need 2 queries. Please share your thoughts.
@fabianmenges @hughhhh @john-bodley @michellethomas @mistercrunch @jeffreythewang
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
The new proposed approach is to have a limit on the UI that the user can see and configure. This UI limit will have some javascript validation to prevent the user from exceeding a particular value. This value will be set to 10K by default (other querying tools use a similar default but ours is more flexible)
The UI will always show at most 1K rows (also configurable) and only via exporting CSV will you be able to see more than the UI limit.
It will look something like the image below (from Jeff’s PR (https://github.com/apache/incubator-superset/pull/4941) )
I second this. I have a table with 2 million rows and I have to remember to manually add
limit
to my query every time I want to try something new in sqllab. And withlimit
, the generated visualization will also be limited, which is not ideal and making it extremely difficult to test new visualization.