Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Stream all results for arbitrary SQL and canned queries

See original GitHub issue

I think that there is a difficulty with canned queries.

When I want to stream all results of a canned query TwoDays I get only first 1.000 records.

Example: http://myserver/history_sample/two_days.csv?_stream=on

returns only first 1.000 records.

If I do the same with the whole database i.e. http://myserver/history_sample/database.csv?_stream=on

I get correctly all records.

Any ideas?

Issue Analytics

State:
Created 4 years ago
Comments:23 (22 by maintainers)

Top GitHub Comments

1reaction

simonwcommented, Sep 27, 2022

It’s interesting to note WHY the time limit works against this so well.

The time limit as-implemented looks like this:

https://github.com/simonw/datasette/blob/5f9f567acbc58c9fcd88af440e68034510fb5d2b/datasette/utils/__init__.py#L181-L201

The key here is conn.set_progress_handler(handler, n) - which specifies that the handler function should be called every n SQLite operations.

The handler function then checks to see if too much time has transpired and conditionally cancels the query.

This also doubles up as a “maximum number of operations” guard, which is what’s happening when you attempt to fetch an infinite number of rows from an infinite table.

That limit code could even be extended to say “exit the query after either 5s or 50,000,000 operations”.

I don’t think that’s necessary though.

To be honest I’m having trouble with the idea of dropping max_returned_rows mainly because what Datasette does (allow arbitrary untrusted SQL queries) is dangerous, so I’ve designed in multiple redundant defence-in-depth mechanisms right from the start.

1reaction

simonwcommented, Mar 21, 2022

I should research how much overhead creating a new connection costs - it may be that an easy way to solve this is to create A dedicated connection for the query and then close that connection at the end.

Top Results From Across the Web

What is Streaming SQL? - Materialize

Streaming SQL is about taking the same declarative SQL used to write database queries, and instead running it on streams of fast-changing data....

Common query patterns in Azure Stream Analytics

This article describes several common query patterns and designs that are useful in Azure Stream Analytics jobs.

Writing SQL on Streaming Data with Amazon Kinesis Analytics

With the launch of Amazon Kinesis Analytics, you can now easily write SQL on streaming data, providing a powerful way to build a...

Processing Data with Java SE 8 Streams, Part 1 - Oracle

Use stream operations to express sophisticated data processing queries. ... you can see a stream as an abstraction for expressing efficient, SQL-like ...

Custom queries - Drift - Simon Binder

For most custom queries, drift can analyze their SQL at compile time, ... get() to run the query once or watch() to get...