Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Azure ML SDK v2] Issue while reading data from `uri_folder` Input type via https://<account_name> scheme

See original GitHub issue

Package Name: azure-ai-ml
Package Version: 1.0.0
Operating System: Windows Server 2022 Standard
Python Version: 3.9.13

Describe the bug According to the documentation it should be possible to access public blob storage containers using Input(type='uri_folder') instance. While passing actual path of the data, azure docs say that it is possible to use either https://<account_name>.blob.core.windows.net/<container_name>/<path> or abfss://<file_system>@<account_name>.dfs.core.windows.net/<path> path format

I tried to use the first option (https://) with diabetes dataset, which is available under the following link: https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes. However, this access method causes error like below:

{"NonCompliant":"DataAccessError(NotFound)"}
{
  "code": "data-capability.UriMountSession.PyFuseError",
  "target": "",
  "category": "UserError",
  "error_details": [
    {
      "key": "NonCompliantReason",
      "value": "DataAccessError(NotFound)"
    },
    {
      "key": "StackTrace",
      "value": "  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py\", line 70, in start\n    (data_path, sub_data_path) = session.start()\n\n  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/data_sessions.py\", line 364, in start\n    options=mnt_options\n\n  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 696, in rslex_uri_volume_mount\n    raise e\n\n  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 690, in rslex_uri_volume_mount\n    mount_context = RslexDirectURIMountContext(mount_point, uri, options)\n"
    }
  ]
}


AzureMLCompute job failed.
data-capability.UriMountSession.PyFuseError: [REDACTED]
  Reason: [REDACTED]
  StackTrace:   File "/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py", line 70, in start
    (data

With the second option, i.e. wasbs://mlsamples@azureopendatastorage.blob.core.windows.net/diabetes job finishes successfully

To Reproduce Steps to reproduce the behavior: Execute the following code:

ml_client = MLClient(...)

job = command(
    command="ls ${{inputs.diabetes}}",
    inputs={
        "diabetes": Input(
            type="uri_folder",
            path="https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes",
        )
    },
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="cpu-cluster",
    display_name="data_access_test",
    # description,
    experiment_name="data_access_test"
)

ml_client.create_or_update(job)

Expected behavior Job will complete successfully. User logs will show the list of files inside passed blob storage folder

Issue Analytics

State:
Created a year ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

glebrhcommented, Nov 29, 2022

Then the documentation should be updated, I guess? Wherever it is mentioned that access to uri_folder is possible via https protocol, it should be removed?

For instance here or here

Or eventually, support for https + uri_folder will be added?

0reactions

luigiwcommented, Dec 16, 2022

I think this will be a document improvement.