checksum calculation should be written C for massive speedup in EventAccumulator.Reload()
See original GitHub issueI recently wanted to load the results of some 100+ experiments just to collect a few scalars from each. Since I logged too many images (a common problem), each file is pretty big (~150 MB) and took >20 seconds to load with tensorboard.backend.event_processing.event_accumulator.EventAccumulator.Reload
. I got frustrated waiting and decided to see what was taking so long.
before changes
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
event_acc = EventAccumulator("example_events.out.tfevents")
%time event_acc.Reload()
##out:
CPU times: user 24.4 s, sys: 43.8 ms, total: 24.5 s
Wall time: 24.4 s
after removing crc checksums in tensorboard/compat/tensorflow_stub/pywrap_tensorflow.py
CPU times: user 117 ms, sys: 63.8 ms, total: 181 ms
Wall time: 180 ms
I understand why one would want checksums, but having the option to not compute checksums could greatly enhance the user experience of many users (including myself).
Alternatively, the checksum could be rewritten in C and called from python on the data object (bytestream, whatever)
319412 function calls in 24.587 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
6728 24.313 0.004 24.313 0.004 pywrap_tensorflow.py:126(crc_update)
11 0.041 0.004 0.041 0.004 {method 'read' of '_io.BufferedReader' objects}
13457 0.037 0.000 0.126 0.000 pywrap_tensorflow.py:255(_read)
...
My guess on the slow step of crc_update (tensorboard/compat/tensorflow_stub/pywrap_tensorflow.py
:
...
for b in buf:
table_index = (crc ^ b) & 0xFF
crc = (CRC_TABLE[table_index] ^ (crc >> 8)) & _MASK
This is exactly the kind of computation that would be so much faster in C. It is fine for small bits of data, but gets out of hand when reading lots of image/audio data.
Thank you team TB.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (2 by maintainers)
We have some work in the pipeline for a data loading implementation that works without TensorFlow and is orders of magnitudes faster than even the current reading implementation that does use TensorFlow. It’s definitely still experimental, but if you’re interested, you can follow pull requests tagged with core:rustboard…
(It will probably also have an option to skip computing checksums. 😃 )
@stephanwlee installing tensorboard is just as fast as nuking the checksums, I’m just going to close the issue. thanks