prepare_data from arcgis.learn modules fails to read the data in Azure N series VM
See original GitHub issueDescribe the bug We had recently acquired azure’s N series VM which are powered by the NVIDIA Tesla K80 card and the Intel Xeon E5-2690 v3 (Haswell) processor. We have been working with arcgis.learn module with VM’s without GPU and were successfully able to work on unet semantic segmentation using the arcgis.learn.UnetClassifier classifier.
However, when we shifted to GPU enabled Azure N series VM we started facing issue with arcgis.learn prepare_data.
To Reproduce Steps to reproduce the behavior:
import arcgis
from arcgis.learn import prepare_data
import fastai
import torch
import torchvision
print(arcgis.__version__)
print(fastai.__version__)
print(torch.__version__)
print(torchvision.__version__)
1.6.2
1.0.39
1.0.0
0.2.2
data = prepare_data(path=r'Path/to/training/data',batch_size=16)
error:
Exception Traceback (most recent call last)
<ipython-input-2-b64d83b2729f> in <module>
----> 1 data = prepare_data(path=r'Path/to/training/data',batch_size=16)
~\AppData\Local\ESRI\conda\envs\arcgispro-py3-deeplearningpro\lib\site-packages\arcgis\learn\_data.py in prepare_data(path, class_mapping, chip_size, val_split_pct, batch_size, transforms, collate_fn, seed, dataset_type)
130
131 if not HAS_FASTAI:
--> 132 _raise_fastai_import_error()
133
134 if type(path) is str:
~\AppData\Local\ESRI\conda\envs\arcgispro-py3-deeplearningpro\lib\site-packages\arcgis\learn\_data.py in _raise_fastai_import_error()
20
21 def _raise_fastai_import_error():
---> 22 raise Exception('This module requires fastai, PyTorch and torchvision as its dependencies. Install it using "conda install -c pytorch -c fastai fastai=1.0.39 pytorch=1.0.0 torchvision"')
23
24 def _bb_pad_collate(samples, pad_idx=0):
Exception: This module requires fastai, PyTorch and torchvision as its dependencies. Install it using "conda install -c pytorch -c fastai fastai=1.0.39 pytorch=1.0.0 torchvision"
Screenshots
Expected behavior The data should have been processed, the same data works on Azure VM without GPU component.
Platform (please complete the following information):
- OS: Windows server 2019
- VM: Standar_NC6, to know more visit: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu
- Browser: Google chrome
- Python API Version: 1.6.2, tested on 1.7.0 issue persists
Additional context I have tested on ArcGIS Pro’s environment as well as tried creating a new environment using Anaconda, the issue still persists on the GPU enabled Azure’s N series VM.
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (1 by maintainers)
Top GitHub Comments
To reproduce use the install command in the Exceptions raised by arcgis.learn.prepare_data by either of the error statements posted by myself or OP.
original Exception:
my Exception:
Suggest changing installation_steps for ‘win32’ to match [‘linux’,‘darwin’] in _data.py to reflect the working code so future errors thrown won’t lead down the same path.
ref arcgis.learn._data.py (v1.8.0)[line 81]
Until Pro ships with an arcgis api version greater than 1.7.0 these error could persist.
Since torchvision is a high level neural network API it uses Pillow to stack the data required for training. On Windows pillow has issues opening image files. On further researching, I found many bugs related to Pillow library having issues with tiff file and recommendations suggested to downgrade the libtiff library to make it work. Even after downgrading the libtiff module the issues still persist. After multiple failed attempts I finally exported the training data into jpeg format. This a workaround worked for me.