Stream all results for arbitrary SQL and canned queries
See original GitHub issueI think that there is a difficulty with canned queries.
When I want to stream all results of a canned query TwoDays I get only first 1.000 records.
Example:
http://myserver/history_sample/two_days.csv?_stream=on
returns only first 1.000 records.
If I do the same with the whole database i.e.
http://myserver/history_sample/database.csv?_stream=on
I get correctly all records.
Any ideas?
Issue Analytics
- State:
- Created 4 years ago
- Comments:23 (22 by maintainers)
Top Results From Across the Web
What is Streaming SQL? - Materialize
Streaming SQL is about taking the same declarative SQL used to write database queries, and instead running it on streams of fast-changing data....
Read more >Common query patterns in Azure Stream Analytics
This article describes several common query patterns and designs that are useful in Azure Stream Analytics jobs.
Read more >Writing SQL on Streaming Data with Amazon Kinesis Analytics
With the launch of Amazon Kinesis Analytics, you can now easily write SQL on streaming data, providing a powerful way to build a...
Read more >Processing Data with Java SE 8 Streams, Part 1 - Oracle
Use stream operations to express sophisticated data processing queries. ... you can see a stream as an abstraction for expressing efficient, SQL-like ...
Read more >Custom queries - Drift - Simon Binder
For most custom queries, drift can analyze their SQL at compile time, ... get() to run the query once or watch() to get...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It’s interesting to note WHY the time limit works against this so well.
The time limit as-implemented looks like this:
https://github.com/simonw/datasette/blob/5f9f567acbc58c9fcd88af440e68034510fb5d2b/datasette/utils/__init__.py#L181-L201
The key here is
conn.set_progress_handler(handler, n)
- which specifies that the handler function should be called everyn
SQLite operations.The handler function then checks to see if too much time has transpired and conditionally cancels the query.
This also doubles up as a “maximum number of operations” guard, which is what’s happening when you attempt to fetch an infinite number of rows from an infinite table.
That limit code could even be extended to say “exit the query after either 5s or 50,000,000 operations”.
I don’t think that’s necessary though.
To be honest I’m having trouble with the idea of dropping
max_returned_rows
mainly because what Datasette does (allow arbitrary untrusted SQL queries) is dangerous, so I’ve designed in multiple redundant defence-in-depth mechanisms right from the start.I should research how much overhead creating a new connection costs - it may be that an easy way to solve this is to create A dedicated connection for the query and then close that connection at the end.