Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[sqllab] How can we make large Superset queries load faster?

See original GitHub issue

I’m looking into the possibility of making large superset queries load faster. And I want members of the community to share ideas here. Many times, users run a long query for a slice and get stuck waiting for a long time to get tens of thousands of rows they don’t intend to see. Before users can see the query the whole query has to run and there usually is a round trip to s3. This takes a really long time.

For inspiration, the Presto/Hive CLI returns almost immediately because it uses something like the less bash command to load results immediately there are some rows.

There is a way to know if any data has been loaded in handle_cursor : (https://github.com/apache/incubator-superset/blob/31a995714df49e55ff69474378845fd8af605d4b/superset/db_engine_specs.py#L617)

https://github.com/apache/incubator-superset/blob/31a995714df49e55ff69474378845fd8af605d4b/superset/db_engine_specs.py#L185

The most basic idea is to make every query 2 queries. One query with a small limit (100?) and a View more button / loading icon so users don’t wrongly assume that’s all the results, while the actual full query keeps running.

I think we can do better than this starting idea. In particular, we shouldn’t need 2 queries. Please share your thoughts.

@fabianmenges @hughhhh @john-bodley @michellethomas @mistercrunch @jeffreythewang

Issue Analytics

State:
Created 6 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

9reactions

timifasubaacommented, Oct 15, 2018

The new proposed approach is to have a limit on the UI that the user can see and configure. This UI limit will have some javascript validation to prevent the user from exceeding a particular value. This value will be set to 10K by default (other querying tools use a similar default but ours is more flexible)

The UI will always show at most 1K rows (also configurable) and only via exporting CSV will you be able to see more than the UI limit.

It will look something like the image below (from Jeff’s PR (https://github.com/apache/incubator-superset/pull/4941) )

2reactions

yuha0commented, Mar 19, 2018

I second this. I have a table with 2 million rows and I have to remember to manually add limit to my query every time I want to try something new in sqllab. And with limit, the generated visualization will also be limited, which is not ideal and making it extremely difficult to test new visualization.