question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Azure ML SDK v2] Method to download Data asset locally

See original GitHub issue

The Data class in the Azure ML SDK v2 allows the uploading and creation of a new Data asset, but not its downloading. I understand that the idea is to not use the new SDK inside training jobs. However, for exploration purposes it is very handy to be able to download a registered Data asset, as is possible with the SDK v1.

Would it be possible to add this feature? Alternatively, is there a way (using other parts of the SDK?) to download assets with paths such as azureml://datastores/<data_store_name>/paths/<path>?


Document details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:2
  • Comments:12 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
tomasvanpottelberghcommented, Oct 4, 2022

Hi @SturgeonMi, I briefly tried this out, but couldn’t get the authentication to work. Anyway, I found a workaround using azure.ai.ml._artifacts._artifact_utilities.download_artifact_from_aml_uri. This is definitely not a great solution, since it’s a “private” API, but I hope that this functionality will get exposed publicly in the azure-ai-ml package at some point.

2reactions
jomalsancommented, Oct 11, 2022

+1 on exposing this functionality would be great. To expand on @tomasvanpottelbergh 's solution, I was able to download locally using the following:

import os

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import azure.ai.ml._artifacts._artifact_utilities as artifact_utils

subscription_id = ""
resource_group = ""
workspace = ""

dataset_name = ""
dataset_version = ""
downloaded_data_folder = "./data"

# Get the client
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

# Lookup the dataset to get the 'path'
data_info = ml_client.data.get(name=dataset_name, version=dataset_version)

# Download the dataset
artifact_utils.download_artifact_from_aml_uri(uri = data_info.path, destination = downloaded_data_folder, datastore_operation=ml_client.datastores)

# Verify it is downloaded
file_path = os.path.basename(data_info.path[10:])
assert os.path.exists(os.path.join(downloaded_data_folder, file_path))
Read more comments on GitHub >

github_iconTop Results From Across the Web

Upgrade data management to SDK v2 - Azure - Microsoft Learn
Upgrade data management from v1 to v2 of Azure Machine Learning SDK. ... In V2, an AzureML data asset can be a uri_folder...
Read more >
Create Data Assets - Azure Machine Learning - Microsoft Learn
Learn how to create Azure Machine Learning data assets. ... Data from Azure ML datastores, Azure Storage, public URLs, and local files.
Read more >
Access data in a job - Azure Machine Learning | Microsoft Learn
Learn how to read and write data for your jobs with the Azure Machine Learning Python SDK v2 and the Azure Machine Learning...
Read more >
Tutorial: ML pipelines with Python SDK v2 - Azure
You'll learn how to use the AzureML Python SDK v2 to: Connect to your Azure ML workspace; Create Azure ML data assets; Create...
Read more >
Access data from Azure cloud storage during interactive ...
You may want to download the data to the local SSD of your host (local machine, cloud VM, Azure ML Compute Instance) and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found