question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Insanely slow data reading from google drive

See original GitHub issue
  • Describe the current behavior:

image

I’m trying to read data as pairs of .png/.pkl files

The problem is that sometimes speed of reading data is around 300 samples per second (it’s the expected one, I almost always have this speed in colab and even on local machine with HDD as well), but sometimes it falls down to 1 sample per second, which leads to 300x slowdown, which is of course not an expected behaviour.

To further readers - I think it is about caching, I found a little trick how I can prevent such slow loading. I just had to delete whole unpacked dataset, and then just apply !unzip dataset.zip, after that it has been cached and it started to load pretty fast!

Browser : Chrome, ver. 86.0.4240.183, 64-bit

No notebook, though, because it uses a lot of in-disk dependicies, so you won’t be able to replicate same mistake.

But if you really want to improve this part of your service, here is how you can try to replicate it:

First of all, suppose notebook localted at “content/drive/My Drive/some_folder_name”, and dataset located at “content/drive/My Drive/some_folder_name/dataset”

Then for all images pathes and pkl pathes we read image via cv2.imread and read pickle via

with open(pickle_path, 'rb') as file: ____pickle_data = pickle.load(file)

Dataset architecture : image

Torch dataset class : image

Create loader function : image

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6

github_iconTop GitHub Comments

4reactions
stephengmatthewscommented, Nov 6, 2020

Have you tried copying the data to local storage instead of reading the data from google drive? If not, it’s worth trying this by copying your data to /tmp at the start of every session before running your models. Something like this:

! cp -r /content/drive/My Drive/some_folder_name/dataset /tmp/dataset
1reaction
dazzle-mecommented, Dec 22, 2020

Stopped using Colab service.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot slow performance in Drive - Google Support
Troubleshoot slow performance in Drive · Step 1: Check the user's browser · Step 2: Check the user's computer · Step 3: Check...
Read more >
Solved: 13 FREE Ways to Fix Google Drive Upload Slow Issue
13 Solutions to Fix the Google Drive Upload Slow Issue · Way 1: Check Google Drive Upload Speed · Way 2: Change Google...
Read more >
Slow Uploads on Google Drive: How to Fix - Alphr
1. Open your browser (This tutorial will use Google Chrome, but the steps are simillar for most browsers.) 2. Use the following shortcut:...
Read more >
Why has Google Drive become so slow? - Quora
This problem is not usually caused by Google Drive app. The actual reason behind the problem might be inadequate hardware of the system...
Read more >
Why is Google Drive so Slow to Upload? (How to Fix It)
Key Takeaways · Identify where the issue is, starting with your data's destination: Google Drive. · Test your internet speed to see if...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found