Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

register_output_renderer() should support streaming data

See original GitHub issue

I’d like to implement this by first extending the register_output_renderer() hook to support streaming huge responses, then switching CSV to use the plugin hook in addition to TSV using it.

_Originally posted by @simonw in https://github.com/simonw/datasette/issues/1096#issuecomment-732542285_

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:12 (12 by maintainers)

Top GitHub Comments

2reactions

simonwcommented, Jan 6, 2021

Idea: instead of returning a dictionary, register_output_renderer could return an object. The object could have the following properties:

.extension - the extension to use
.can_render(...) - says if it can render this
.can_stream(...) - says if streaming is supported
async .stream_rows(rows_iterator, send) - method that loops through all rows and uses send to send them to the response in the correct format

I can then deprecate the existing dict return type for 1.0.

0reactions

eyeseastcommented, Apr 21, 2022

Ha! That was your idea (and a good one).

But it’s probably worth measuring to see what overhead it adds. It did require both passing in the database and making the whole thing async.

Just timing the queries themselves:

Using AsGeoJSON(geometry) as geometry takes 10.235 ms
Leaving as binary takes 8.63 ms

Looking at the network panel:

Takes about 200 ms for the fetch request
Takes about 300 ms

I’m not sure how best to time the GeoJSON generation, but it would be interesting to check. Maybe I’ll write a plugin to add query times to response headers.

The other thing to consider with async streaming is that it might be well-suited for a slower response. When I have to get the whole result and send a response in a fixed amount of time, I need the most efficient query possible. If I can hang onto a connection and get things one chunk at a time, maybe it’s ok if there’s some overhead.

Top Results From Across the Web

What Is Streaming Data? | Amazon Web Services (AWS)

Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in...

Spark Streaming Programming Guide

DStreams can be created either from input data streams from sources such as Kafka, and Kinesis, or by applying high-level operations on other...

Data Streaming: Benefits, Examples, and Use Cases - Confluent

Also known as stream data processing, data streaming is the continuous flow of ... By using stream processing technology, data streams can be...

Use the legacy streaming API | BigQuery - Google Cloud

The table must exist before you begin writing data to it unless you are using template tables. For more information on template tables,...

Oracle Cloud Infrastructure Streaming FAQ

The Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, ... streams of data that you can consume and process in near...