question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dvc pull: failed to pull data from the Azure Blob in GitHub Actions

See original GitHub issue

Bug Report

Description

dvc pull fails in GitHub Actions with an auth issue when accessing Azure Blob Storage. The same actions worked yesterday, and still work correctly on my dev machine. In GitHub Actions I receive the following error message: ERROR: failed to pull data from the cloud - Authentication to Azure Blob Storage via account key failed. Learn more about configuration settings at <https://man.dvc.org/remote/modify>: unable to connect to account for Unable to create async transport. Please check aiohttp is installed.

Reproduce

GitHub Actions config to reproduce

on: [push]
jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: '3.8' 
          architecture: 'x64' 
      - uses: iterative/setup-cml@v1
      - name: PiP Install
        run: pip install -r requirements.txt
      - name: Train model
      run: |
          dvc doctor
          dvc pull

Expected

dvc pull succeeds, downloading my output files configured in the dvc pipeline file. Running dvc repro after this also succeeds, skipping all stages since the latest output is available.

Environment information

The issue only reproduces in GitHub Actions. The issue seemingly started today by random. The same auth setup worked yesterday in Actions. I had several successful pipelines runs and PRs. Azure Blob Storage is configured in .dvc/config with account_name and account_key. I can verify it works locally by deleting .dvc/cache, and running dvc pull. The cache correctly recreates. Rerunning dvc pull locally results in Everything is up to date. dvc push also works.

Output of dvc doctor:

# In GitHub Actions where the problem occurs
$ dvc doctor
DVC version: 2.8.2 (pip)
---------------------------------
Platform: Python 3.8.12 on Linux-5.8.0-1042-azure-x86_64-with-glibc2.2.5
Supports:
	azure (adlfs = 2021.10.0, knack = 0.8.2, azure-identity = 1.7.0),
	gdrive (pydrive2 = 1.10.0),
	webhdfs (fsspec = 2021.10.1),
	http (aiohttp = 3.8.0, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.0, aiohttp-retry = 2.4.6)

Additional Information (if any):

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
hosszubalazscommented, Nov 1, 2021

aiohttp 3.8.0 seems to cause the problem. Downgrading to 3.7.4 is a possible workaround:

 DVC version: 2.8.2 (pip)
---------------------------------
Platform: Python 3.8.12 on Linux-5.8.0-1042-azure-x86_64-with-glibc2.2.5
Supports:
	azure (adlfs = 2021.10.0, knack = 0.8.2, azure-identity = 1.7.0),
	gdrive (pydrive2 = 1.10.0),
	webhdfs (fsspec = 2021.10.1),
	http (aiohttp = 3.7.4, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.7.4, aiohttp-retry = 2.4.6)
1reaction
skshetrycommented, Nov 2, 2021

This has been fixed upstream, so it should be resolved automatically.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ERROR: failed to pull data from the cloud - Questions - DVC
DVC pull data from the s3 bucket is always failed in GitHub Action. My project link: Tried dvc pull and dvc push ,...
Read more >
shcheklein/example-get-started: Get started DVC project
This is a read-only HTTP remote. $ dvc remote list storage https://remote.dvc.org/get-started. You can run dvc pull to download the data: $ dvc...
Read more >
Recently Active 'dvc' Questions - Stack Overflow
Getting this weird error when trying to run DVC pull ... DVC to track and version data that is stored locally on the...
Read more >
Syncing Data to Azure Blob Storage - Iterative.ai
Push and pull data with DVC. Now you can push data from your local machine to the Azure remote! First, add the data...
Read more >
Data Version Control With Python and DVC - Real Python
Large data and model files go in your DVC remote storage, and small .dvc files that ... You can then extract the dataset...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found