question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[sqllab] How can we make large Superset queries load faster?

See original GitHub issue

I’m looking into the possibility of making large superset queries load faster. And I want members of the community to share ideas here. Many times, users run a long query for a slice and get stuck waiting for a long time to get tens of thousands of rows they don’t intend to see. Before users can see the query the whole query has to run and there usually is a round trip to s3. This takes a really long time.

For inspiration, the Presto/Hive CLI returns almost immediately because it uses something like the less bash command to load results immediately there are some rows.

There is a way to know if any data has been loaded in handle_cursor : (https://github.com/apache/incubator-superset/blob/31a995714df49e55ff69474378845fd8af605d4b/superset/db_engine_specs.py#L617)

https://github.com/apache/incubator-superset/blob/31a995714df49e55ff69474378845fd8af605d4b/superset/db_engine_specs.py#L185

The most basic idea is to make every query 2 queries. One query with a small limit (100?) and a View more button / loading icon so users don’t wrongly assume that’s all the results, while the actual full query keeps running.

I think we can do better than this starting idea. In particular, we shouldn’t need 2 queries. Please share your thoughts.

@fabianmenges @hughhhh @john-bodley @michellethomas @mistercrunch @jeffreythewang

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

9reactions
timifasubaacommented, Oct 15, 2018

The new proposed approach is to have a limit on the UI that the user can see and configure. This UI limit will have some javascript validation to prevent the user from exceeding a particular value. This value will be set to 10K by default (other querying tools use a similar default but ours is more flexible)

The UI will always show at most 1K rows (also configurable) and only via exporting CSV will you be able to see more than the UI limit.

It will look something like the image below (from Jeff’s PR (https://github.com/apache/incubator-superset/pull/4941) ) image

2reactions
yuha0commented, Mar 19, 2018

I second this. I have a table with 2 million rows and I have to remember to manually add limit to my query every time I want to try something new in sqllab. And with limit, the generated visualization will also be limited, which is not ideal and making it extremely difficult to test new visualization.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[sqllab] How can we make large Superset queries load faster?
I 'm looking into the possibility of making large superset queries load faster. And I want members of the community to share ideas...
Read more >
Frequently Asked Questions - Apache Superset
Can I join / query multiple tables at one time? · How BIG can my datasource be? · How do I create my...
Read more >
Speeding up Superset by choosing the right database
Creating your first dashboard in Superset is just a matter of setting up the database connection to your data sources and it will...
Read more >
Apache SuperSet is very slow - Stack Overflow
You can reduce the query load on Druid by using a less granular resolution on your dashboard - if it's possible to have...
Read more >
Apache Superset Tutorial - Start Data Engineering
Using Apache Superset. 1. Connecting to a data warehouse; 2. Querying data in SQL Lab; 3. Creating a chart; 4. Creating a dashboard....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found