question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Azure ML SDK v2] Issue while reading data from `uri_folder` Input type via https://<account_name> scheme

See original GitHub issue
  • Package Name: azure-ai-ml
  • Package Version: 1.0.0
  • Operating System: Windows Server 2022 Standard
  • Python Version: 3.9.13

Describe the bug According to the documentation it should be possible to access public blob storage containers using Input(type='uri_folder') instance. While passing actual path of the data, azure docs say that it is possible to use either https://<account_name>.blob.core.windows.net/<container_name>/<path> or abfss://<file_system>@<account_name>.dfs.core.windows.net/<path> path format

I tried to use the first option (https://) with diabetes dataset, which is available under the following link: https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes. However, this access method causes error like below:

{"NonCompliant":"DataAccessError(NotFound)"}
{
  "code": "data-capability.UriMountSession.PyFuseError",
  "target": "",
  "category": "UserError",
  "error_details": [
    {
      "key": "NonCompliantReason",
      "value": "DataAccessError(NotFound)"
    },
    {
      "key": "StackTrace",
      "value": "  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py\", line 70, in start\n    (data_path, sub_data_path) = session.start()\n\n  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/data_sessions.py\", line 364, in start\n    options=mnt_options\n\n  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 696, in rslex_uri_volume_mount\n    raise e\n\n  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 690, in rslex_uri_volume_mount\n    mount_context = RslexDirectURIMountContext(mount_point, uri, options)\n"
    }
  ]
}


AzureMLCompute job failed.
data-capability.UriMountSession.PyFuseError: [REDACTED]
  Reason: [REDACTED]
  StackTrace:   File "/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py", line 70, in start
    (data

With the second option, i.e. wasbs://mlsamples@azureopendatastorage.blob.core.windows.net/diabetes job finishes successfully

To Reproduce Steps to reproduce the behavior: Execute the following code:

ml_client = MLClient(...)

job = command(
    command="ls ${{inputs.diabetes}}",
    inputs={
        "diabetes": Input(
            type="uri_folder",
            path="https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes",
        )
    },
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="cpu-cluster",
    display_name="data_access_test",
    # description,
    experiment_name="data_access_test"
)

ml_client.create_or_update(job)

Expected behavior Job will complete successfully. User logs will show the list of files inside passed blob storage folder

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
glebrhcommented, Nov 29, 2022

Then the documentation should be updated, I guess? Wherever it is mentioned that access to uri_folder is possible via https protocol, it should be removed?

For instance here or here

Or eventually, support for https + uri_folder will be added?

0reactions
luigiwcommented, Dec 16, 2022

I think this will be a document improvement.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Upgrade data management to SDK v2 - Azure - Microsoft Learn
In this article. Create a filedataset/ uri type of data asset; Create a tabular dataset/data asset; Use data in an experiment/job; Mapping ...
Read more >
Access data in a job - Azure Machine Learning | Microsoft Learn
Learn how to read and write data for your jobs with the Azure Machine Learning Python SDK v2 and the Azure Machine Learning...
Read more >
Data access - Azure Machine Learning | Microsoft Learn
Azure Machine Learning lets you bring data from a local machine or an existing cloud-based storage. In this article you will learn the...
Read more >
Tutorial: ML pipelines with Python SDK v2 - Azure
Use Azure Machine Learning to create your production-ready ML project in a cloud-based Python Jupyter Notebook using Azure ML Python SDK v2.
Read more >
Accessing data from batch endpoints jobs - Azure
Supported data inputs. Batch endpoints support reading files located in the following storage options: Azure Machine Learning Data Stores.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found