question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Stream all results for arbitrary SQL and canned queries

See original GitHub issue

I think that there is a difficulty with canned queries.

When I want to stream all results of a canned query TwoDays I get only first 1.000 records.

Example: http://myserver/history_sample/two_days.csv?_stream=on

returns only first 1.000 records.

If I do the same with the whole database i.e. http://myserver/history_sample/database.csv?_stream=on

I get correctly all records.

Any ideas?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:23 (22 by maintainers)

github_iconTop GitHub Comments

1reaction
simonwcommented, Sep 27, 2022

It’s interesting to note WHY the time limit works against this so well.

The time limit as-implemented looks like this:

https://github.com/simonw/datasette/blob/5f9f567acbc58c9fcd88af440e68034510fb5d2b/datasette/utils/__init__.py#L181-L201

The key here is conn.set_progress_handler(handler, n) - which specifies that the handler function should be called every n SQLite operations.

The handler function then checks to see if too much time has transpired and conditionally cancels the query.

This also doubles up as a “maximum number of operations” guard, which is what’s happening when you attempt to fetch an infinite number of rows from an infinite table.

That limit code could even be extended to say “exit the query after either 5s or 50,000,000 operations”.

I don’t think that’s necessary though.

To be honest I’m having trouble with the idea of dropping max_returned_rows mainly because what Datasette does (allow arbitrary untrusted SQL queries) is dangerous, so I’ve designed in multiple redundant defence-in-depth mechanisms right from the start.

1reaction
simonwcommented, Mar 21, 2022

I should research how much overhead creating a new connection costs - it may be that an easy way to solve this is to create A dedicated connection for the query and then close that connection at the end.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is Streaming SQL? - Materialize
Streaming SQL is about taking the same declarative SQL used to write database queries, and instead running it on streams of fast-changing data....
Read more >
Common query patterns in Azure Stream Analytics
This article describes several common query patterns and designs that are useful in Azure Stream Analytics jobs.
Read more >
Writing SQL on Streaming Data with Amazon Kinesis Analytics
With the launch of Amazon Kinesis Analytics, you can now easily write SQL on streaming data, providing a powerful way to build a...
Read more >
Processing Data with Java SE 8 Streams, Part 1 - Oracle
Use stream operations to express sophisticated data processing queries. ... you can see a stream as an abstraction for expressing efficient, SQL-like ...
Read more >
Custom queries - Drift - Simon Binder
For most custom queries, drift can analyze their SQL at compile time, ... get() to run the query once or watch() to get...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found