question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

register_output_renderer() should support streaming data

See original GitHub issue

I’d like to implement this by first extending the register_output_renderer() hook to support streaming huge responses, then switching CSV to use the plugin hook in addition to TSV using it.

_Originally posted by @simonw in https://github.com/simonw/datasette/issues/1096#issuecomment-732542285_

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
simonwcommented, Jan 6, 2021

Idea: instead of returning a dictionary, register_output_renderer could return an object. The object could have the following properties:

  • .extension - the extension to use
  • .can_render(...) - says if it can render this
  • .can_stream(...) - says if streaming is supported
  • async .stream_rows(rows_iterator, send) - method that loops through all rows and uses send to send them to the response in the correct format

I can then deprecate the existing dict return type for 1.0.

0reactions
eyeseastcommented, Apr 21, 2022

Ha! That was your idea (and a good one).

But it’s probably worth measuring to see what overhead it adds. It did require both passing in the database and making the whole thing async.

Just timing the queries themselves:

  1. Using AsGeoJSON(geometry) as geometry takes 10.235 ms
  2. Leaving as binary takes 8.63 ms

Looking at the network panel:

  1. Takes about 200 ms for the fetch request
  2. Takes about 300 ms

I’m not sure how best to time the GeoJSON generation, but it would be interesting to check. Maybe I’ll write a plugin to add query times to response headers.

The other thing to consider with async streaming is that it might be well-suited for a slower response. When I have to get the whole result and send a response in a fixed amount of time, I need the most efficient query possible. If I can hang onto a connection and get things one chunk at a time, maybe it’s ok if there’s some overhead.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What Is Streaming Data? | Amazon Web Services (AWS)
Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in...
Read more >
Spark Streaming Programming Guide
DStreams can be created either from input data streams from sources such as Kafka, and Kinesis, or by applying high-level operations on other...
Read more >
Data Streaming: Benefits, Examples, and Use Cases - Confluent
Also known as stream data processing, data streaming is the continuous flow of ... By using stream processing technology, data streams can be...
Read more >
Use the legacy streaming API | BigQuery - Google Cloud
The table must exist before you begin writing data to it unless you are using template tables. For more information on template tables,...
Read more >
Oracle Cloud Infrastructure Streaming FAQ
The Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, ... streams of data that you can consume and process in near...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found