Patch/Stream message to the ColumnDataSource for DataTable widget on front-end triggers response containing entire data source
See original GitHub issueI’ve been struggling to optimize my application that is using DataTable
widget on the front-end to show to the user a catalog of information updated in the real time. Originally I’ve used data source trigger to sent information to the front-end. This become quite quickly unfeasible solution, because number of information is quite large. For this reason I’ve tried to update data source via stream and patch operations. First problem that I’ve come across was that patch/stream operation takes progressively more time as the size of data source grows. I’ve raised issue #7072.
Even after my monkey patch solution of previous problem the situation didn’t improve much. Using Chrome DevTools I’ve monitored communication over websocket and found that on each ColumnsStreamed
message sent to front-end, it responds with ModelChanged
message that contains entire data source. This is hugely ineffective. Because when the data source grows these messages get big very quickly especially when DataTable
has multiple columns. Same situation is of course with patch operation as well.
Strange thing is that my application also contains plot widgets which are not exhibiting this behavior. Therefore my conclusion is that this is caused by DataTable
widget and not by ColumnDataSource
.
Expected behavior
By default Stream and Patch operation that are applied to ColumnDataSource
of a DataTable
widget will not trigger any response from front-end.
System Info
Since this is only very general issue I’m presuming that these are the only relevant system information:
bokeh: 0.12.10
python: 3.5.2 (default, Nov 17 2016, 17:05:23)
system: Linux-4.4.0-97-generic-x86_64-with-Ubuntu-16.04-xenial
Example
I’ve put together minimal example that shows this behavior. It creates page with one button and one DataTable
. On button click back-end starts sending table rows with ColumnDataSource
stream method. It was implemented by modification of export_csv example app.
import time
import random
from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import DataTable, TableColumn, Button
class TestSource:
def __init__(self):
self.source = ColumnDataSource(data = self._empty_source_dict())
def _empty_source_dict(self):
return {'x': [], 'quantity': []}
def add(self, record):
record_list = {key: [value] for key, value in record.items()}
self.source.stream(record_list)
def get_record(index):
return {
'x': index,
'quantity': random.randint(0, 500)
}
def callback():
for i in range(30):
record = get_record(i)
test_source.add(record)
time.sleep(1)
test_source = TestSource()
columns = [TableColumn(field="x", title="x"), TableColumn(field="quantity", title="quantity")]
table = DataTable(source = test_source.source, columns = columns, width = 400)
button = Button()
button.on_click(callback)
curdoc().add_root(column(button, table))
curdoc().title = "Test of patch and stream"
To illustrate the issue here is last message being sent to front-end:
{
"references": [],
"events": [
{
"column_source": {
"id": "84c57a47-12e0-43bb-92d6-09a62dbf4707",
"type": "ColumnDataSource"
},
"rollover": null,
"kind": "ColumnsStreamed",
"data": {
"quantity": [41],
"x": [29]
}
}
]
}
On which front-end response with this one:
{
"events": [
{
"kind": "ModelChanged",
"model": {
"type": "ColumnDataSource",
"id": "84c57a47-12e0-43bb-92d6-09a62dbf4707"
},
"attr": "data",
"new": {
"quantity": [419, 251, 99, 6, 44, 65, 97, 156, 43, 457, 387, 338, 34, 390, 287, 53, 181, 397, 439, 250, 154, 36, 386, 103, 300, 165, 23, 321, 446, 41],
"x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
}
}
],
"references": []
}
Conclusion
My hypothesis is that DataTable
is sending the source data back to bokeh because the widget provides the possibility of editable tables. Which would explain the need to send update back to back-end. If this is the case then I would propose an addition of parameter that would disable this behavior and possibly disable by default unless DataTable
parameter editable
is set to True
.
I would appreciate some pointers on how to disable it the meantime. I starting to understand the bokeh server code base pretty well, but my knowledge is severally lacking in the front-end area.
Issue Analytics
- State:
- Created 6 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
As it turns out there was already mechanisms to distinguish those two code paths, but patch/stream callbacks did not make use of them. Barring discovering any unforeseen bad interactions, I will have a very small PR soon to fix this, but need to try an devise a proper test to go along with it.
@codeape2 Thanks for the additional information.
It’s possible there is a way to do that. It will require some investigation. Offhand, my guess is that one (or both) of these two is the culprit:
https://github.com/bokeh/bokeh/blob/master/bokehjs/src/coffee/models/widgets/tables/data_table.coffee#L80-L82
https://github.com/bokeh/bokeh/blob/master/bokehjs/src/coffee/models/widgets/tables/data_table.coffee#L110-L112
And I think that the issue is that the same code path is used to set the data table regardless of whether its due to an edit on the browser side (should send an update) or whether the table just got updated via a websocket event (should not send an update). In pretty much every other situation there is already code to prevent message “ping pong” but as I said above the DataTable has some specific and particular requirements that seem to be thwarting the normal mechanisms.
Very speculatively I might suggest that it’s possible to decouple these two code paths. Perhaps local browser edits can emit a signal, instead of updating the data source directly. Then the response to the “local edit” signal would make a silent (non-notifying) update. There would be several assumptions to verify first before knowing if this idea has legs. I’d be happy to discuss further, though.
Re: tests, unfortunately our selenium test suite is inoperable at the moment after some recent build refactoring. It needs to be rebuilt from the ground up at some point. Fortunately we mostly can rely on image diff and python and JS unit tests, and I think testing this could be covered in our BokehJS unit tests (but I’d have to give it some thought)