Performance comparison of make_reader() & make_petastorm_dataset() vs make_spark_converter() & make_tf_dataset()
See original GitHub issueFrom the API user guide, it seems that there are two different way of using Petastorm to train tensorflow models
- Using
make_reader()
ormake_batch_reader()
and then usingmake_petastorm_dataset()
to create a tf.data object - Using
make_spark_converter()
to materialize the dataset and then usingconverter.make_tf_dataset()
to create a tf.data object
All things equal, which of these would be expected to have faster performance? I know that option 1 reads from a file path while option 2 starts with a spark dataframe. Option 2 seems simpler, but is there a loss of performance that would be associated with it?
Thanks
Issue Analytics
- State:
- Created 3 years ago
- Comments:8
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
make_spark_converter -> make_tf_dataset uses make_batch_reader+make_petastorm_dataset underneath (to read from the temporary parquet store created).
Can you please provide more information on the slowdown?
It would be best if you could distill a small example I could actually run and profile. It might be hard to see the issue just from the code as it’s likely about the combination of the code and the data structure underneath.
No problem at all…
Can you please take a look at the horovod example:
https://github.com/horovod/horovod/blob/master/examples/spark/keras/keras_spark_rossmann_run.py, I know they were polishing training pipeline performance and have a good batch-based implementation. Perhaps it will give you some clues.