[Azure ML SDK v2] Issue while reading data from `uri_folder` Input type via https://<account_name> scheme
See original GitHub issue- Package Name: azure-ai-ml
- Package Version: 1.0.0
- Operating System: Windows Server 2022 Standard
- Python Version: 3.9.13
Describe the bug
According to the documentation it should be possible to access public blob storage containers using Input(type='uri_folder')
instance. While passing actual path of the data, azure docs say that it is possible to use either
https://<account_name>.blob.core.windows.net/<container_name>/<path>
or
abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>
path format
I tried to use the first option (https://) with diabetes dataset, which is available under the following link: https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes
. However, this access method causes error like below:
{"NonCompliant":"DataAccessError(NotFound)"}
{
"code": "data-capability.UriMountSession.PyFuseError",
"target": "",
"category": "UserError",
"error_details": [
{
"key": "NonCompliantReason",
"value": "DataAccessError(NotFound)"
},
{
"key": "StackTrace",
"value": " File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py\", line 70, in start\n (data_path, sub_data_path) = session.start()\n\n File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/data_sessions.py\", line 364, in start\n options=mnt_options\n\n File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 696, in rslex_uri_volume_mount\n raise e\n\n File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 690, in rslex_uri_volume_mount\n mount_context = RslexDirectURIMountContext(mount_point, uri, options)\n"
}
]
}
AzureMLCompute job failed.
data-capability.UriMountSession.PyFuseError: [REDACTED]
Reason: [REDACTED]
StackTrace: File "/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py", line 70, in start
(data
With the second option, i.e. wasbs://mlsamples@azureopendatastorage.blob.core.windows.net/diabetes
job finishes successfully
To Reproduce Steps to reproduce the behavior: Execute the following code:
ml_client = MLClient(...)
job = command(
command="ls ${{inputs.diabetes}}",
inputs={
"diabetes": Input(
type="uri_folder",
path="https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes",
)
},
environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
compute="cpu-cluster",
display_name="data_access_test",
# description,
experiment_name="data_access_test"
)
ml_client.create_or_update(job)
Expected behavior Job will complete successfully. User logs will show the list of files inside passed blob storage folder
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
Then the documentation should be updated, I guess? Wherever it is mentioned that access to
uri_folder
is possible via https protocol, it should be removed?For instance here or here
Or eventually, support for https +
uri_folder
will be added?I think this will be a document improvement.