Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Benchmarked dataset iteration speed lower as expected

See original GitHub issue

Hello!

I’m really excited about the features from deeplake (streaming directly from s3, dataset versioning and filtering). However, a preliminary benchmark showed a significantly lower dataset iteration speed compared to local file storage when iterating over (256,256,3) uint8 PNGs:

local dataset using tf.io loader: ~70-1000 batches/s
local dataset using PIL loader: ~25 batches/s
local dataset using deeplake dataset: ~5 batches/s

Sidenote: When iterating over the deeplake dataset, download speed was ~50MB/s. When downloading a single 3GB file from S3, average download speed was ~300MB/s.

The small benchmark code is stored here: https://github.com/cgebbe/benchmark_deeplake/tree/50621dd28a08208fe70deb07d451d01474687b54

Are these numbers to be expected? I am using the library wrong or are higher speeds only available via activeloop and not S3? I hoped that the iteration speed would be at least as fast as the local PIL loader.

Issue Analytics

State:
Created 9 months ago
Comments:9

Top GitHub Comments

1reaction

AbhinavTulicommented, Dec 14, 2022

Hey @cgebbe! Thanks for raising this issue. I see from the benchmarks, that you have used the tensorflow integration of deeplake for this. This is a very thin wrapper and is not optimized right now. We have 2 other dataloaders present, that can be used using ds.pytorch() and ds.dataloader() (the latter is an enterprise feature right now, built in CPP), both of these should give significantly better performance. Could you try using those and let us know if the issue persists?

0reactions

cgebbecommented, Dec 15, 2022

I’ll follow up the discussion here so that others can see it, too.

python3 -m pip uninstall libdeeplake; python3 -m pip install libdeeplake==0.0.32 fixed the segmentation fault issue, thanks a lot!

As promised, the optimized dataloader is slightly faster than the tensorflow dataloader using PIL:

using PIL: ~15-25 batches/s
using deeplakes optimized dataloader with torch on a r6i.xlarge instance: ~20 batches/s (at ~150MB/s)
using deeplakes optimized dataloader with torch on a p3.16xlarge instance: ~30 batches/s (at ~250MB/s)

@AbhinavTuli : I believe in the discussion you mentioned that you still achieve significantly higher download speeds, is this correct?

Next steps for us are to…

benchmark example dataset using local tfrecords files
run an actual training on realistic data and monitor GPU utilization. For this, we likely need to wait until the C++ loader supports tensorflow. Thanks for the support again!

Current code: https://github.com/cgebbe/benchmark_deeplake/blob/8543d1eabdb0e6c0bebd7a4700e7f5c88555c04f/README.md