Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Patch/Stream message to the ColumnDataSource for DataTable widget on front-end triggers response containing entire data source

See original GitHub issue

I’ve been struggling to optimize my application that is using DataTable widget on the front-end to show to the user a catalog of information updated in the real time. Originally I’ve used data source trigger to sent information to the front-end. This become quite quickly unfeasible solution, because number of information is quite large. For this reason I’ve tried to update data source via stream and patch operations. First problem that I’ve come across was that patch/stream operation takes progressively more time as the size of data source grows. I’ve raised issue #7072.

Even after my monkey patch solution of previous problem the situation didn’t improve much. Using Chrome DevTools I’ve monitored communication over websocket and found that on each ColumnsStreamed message sent to front-end, it responds with ModelChanged message that contains entire data source. This is hugely ineffective. Because when the data source grows these messages get big very quickly especially when DataTable has multiple columns. Same situation is of course with patch operation as well.

Strange thing is that my application also contains plot widgets which are not exhibiting this behavior. Therefore my conclusion is that this is caused by DataTable widget and not by ColumnDataSource.

Expected behavior

By default Stream and Patch operation that are applied to ColumnDataSource of a DataTable widget will not trigger any response from front-end.

System Info

Since this is only very general issue I’m presuming that these are the only relevant system information:

bokeh: 0.12.10
python: 3.5.2 (default, Nov 17 2016, 17:05:23)
system: Linux-4.4.0-97-generic-x86_64-with-Ubuntu-16.04-xenial

Example

I’ve put together minimal example that shows this behavior. It creates page with one button and one DataTable. On button click back-end starts sending table rows with ColumnDataSource stream method. It was implemented by modification of export_csv example app.

import time
import random

from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import DataTable, TableColumn, Button

class TestSource:
  def __init__(self):
    self.source = ColumnDataSource(data = self._empty_source_dict())

  def _empty_source_dict(self):
    return {'x': [], 'quantity': []}

  def add(self, record):
    record_list = {key: [value] for key, value in record.items()}
    self.source.stream(record_list)

def get_record(index):
  return {
    'x': index,
    'quantity': random.randint(0, 500)
  }

def callback():
  for i in range(30):
    record = get_record(i)
    test_source.add(record)
    time.sleep(1)

test_source = TestSource()
columns = [TableColumn(field="x", title="x"), TableColumn(field="quantity", title="quantity")]
table = DataTable(source = test_source.source, columns = columns, width = 400)
button = Button()

button.on_click(callback)

curdoc().add_root(column(button, table))
curdoc().title = "Test of patch and stream"

To illustrate the issue here is last message being sent to front-end:

{
  "references": [],
  "events": [
    {
      "column_source": {
        "id": "84c57a47-12e0-43bb-92d6-09a62dbf4707",
        "type": "ColumnDataSource"
      },
      "rollover": null,
      "kind": "ColumnsStreamed",
      "data": {
        "quantity": [41],
        "x": [29]
      }
    }
  ]
}

On which front-end response with this one:

{
  "events": [
    {
      "kind": "ModelChanged",
      "model": {
        "type": "ColumnDataSource",
        "id": "84c57a47-12e0-43bb-92d6-09a62dbf4707"
      },
      "attr": "data",
      "new": {
        "quantity": [419, 251, 99, 6, 44, 65, 97, 156, 43, 457, 387, 338, 34, 390, 287, 53, 181, 397, 439, 250, 154, 36, 386, 103, 300, 165, 23, 321, 446, 41],
        "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
      }
    }
  ],
  "references": []
}

Conclusion

My hypothesis is that DataTable is sending the source data back to bokeh because the widget provides the possibility of editable tables. Which would explain the need to send update back to back-end. If this is the case then I would propose an addition of parameter that would disable this behavior and possibly disable by default unless DataTable parameter editable is set to True.

I would appreciate some pointers on how to disable it the meantime. I starting to understand the bokeh server code base pretty well, but my knowledge is severally lacking in the front-end area.

Issue Analytics

State:
Created 6 years ago
Comments:11 (6 by maintainers)

Top GitHub Comments

1reaction

bryevdvcommented, Oct 8, 2018

As it turns out there was already mechanisms to distinguish those two code paths, but patch/stream callbacks did not make use of them. Barring discovering any unforeseen bad interactions, I will have a very small PR soon to fix this, but need to try an devise a proper test to go along with it.

1reaction

bryevdvcommented, Jan 3, 2018

@codeape2 Thanks for the additional information.

It’s possible there is a way to do that. It will require some investigation. Offhand, my guess is that one (or both) of these two is the culprit:

https://github.com/bokeh/bokeh/blob/master/bokehjs/src/coffee/models/widgets/tables/data_table.coffee#L80-L82

https://github.com/bokeh/bokeh/blob/master/bokehjs/src/coffee/models/widgets/tables/data_table.coffee#L110-L112

And I think that the issue is that the same code path is used to set the data table regardless of whether its due to an edit on the browser side (should send an update) or whether the table just got updated via a websocket event (should not send an update). In pretty much every other situation there is already code to prevent message “ping pong” but as I said above the DataTable has some specific and particular requirements that seem to be thwarting the normal mechanisms.

Very speculatively I might suggest that it’s possible to decouple these two code paths. Perhaps local browser edits can emit a signal, instead of updating the data source directly. Then the response to the “local edit” signal would make a silent (non-notifying) update. There would be several assumptions to verify first before knowing if this idea has legs. I’d be happy to discuss further, though.

Re: tests, unfortunately our selenium test suite is inoperable at the moment after some recent build refactoring. It needs to be rebuilt from the ground up at some point. Fortunately we mostly can rely on image diff and python and JS unit tests, and I think testing this could be covered in our BokehJS unit tests (but I’d have to give it some thought)

Top Results From Across the Web

Patch/Stream message to the ColumnDataSource for ... - GitHub

Patch/Stream message to the ColumnDataSource for DataTable widget on front-end triggers response containing entire data source #7116.

DataTable is not updating with multichoice widget as expected

Hello all, I'm trying to show a data table linked with multi-choice widget. The goal is to only show the rows that matches...

Bokeh DataTable won't update after trigger('change') without ...

If you only care about updating the table, then you don't actually need to pass both the data source and the "data table"....

Bokeh - freshcode.club

... bokehjs widgets Patch/stream message to the columndatasource for datatable widget on front-end triggers response containing entire data source.

https://raw.githubusercontent.com/bokeh/bokeh/mast...

... bokehjs] [widgets] Patch/stream message to the columndatasource for datatable widget on front-end triggers response containing entire data source ...