question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

'az storage blob download-batch' extremely slow (40x slower than azcopy)

See original GitHub issue

az feedback auto-generates most of the information requested below, as of CLI version 2.0.62

Describe the bug

I have been using the AzureCLI pipeline task to download files from a Data Lake with the az tool because the AzureFileCopy task isn’t available on Linux (only on Windows).

Because there is no support to supply multiple patterns to az storage blob batch-download I need multiple invocations, additionally slowing down the download due to https://github.com/Azure/azure-cli/issues/9444.

I will give my full example of using the az tool in my pipeline:

parameters:
- name: file_patterns
  type: object
  default:
  - rf/*c_*b.nc
  - calibration/left.nc
  - calibration/right.nc
  - phase_two/*/left.nc
  - phase_two/*/right.nc

jobs:
- job: "download files"
  pool: my-vmss-agents-pool
  steps:
  - ${{ each file_pattern in parameters.file_patterns }}:
    - task: AzureCLI@1
      displayName: Copy ${{ file_pattern }} from Data Lake
      inputs:
        scriptType: bash
        azureSubscription: my-vmssagents-service-connection
        scriptLocation: inlineScript
        inlineScript: |
          az storage blob download-batch \
            --source "data" \
            --account-name "data" \
            --max-connections=6 \
            --destination "$(Build.Repository.LocalPath)" \
            --pattern "${{ file_pattern }}"

The complete time to download the 5.7 GB distributed over 8 files is 492 seconds.

While using azcopy copy (for which I, unfortunately, need some tricky boilerplate to authenticate) it takes 13 seconds to download the exact same!

I have verified that the azcopy tool downloads the exact same files.

For completeness, I used this single pipeline task (because azcopy allows for multiple patterns)

- task: AzureCLI@2
  displayName: Download using azcopy
  inputs:
      azureSubscription: my-vmssagents-service-connection
      scriptType: bash
      scriptLocation: inlineScript
      inlineScript: |
      export STORE_NAME="data"
      export CONTAINER_NAME="data"
  
      NOW=`date +"%Y-%m-%dT%H:%M:00Z"` \
      EXPIRY=`date -d "$NOW + 1 day" +"%Y-%m-%dT%H:%M:00Z"` \
      && export SAS_TOKEN=$( az storage container generate-sas \
          --account-name $STORE_NAME \
          --name $CONTAINER_NAME \
          --start $NOW \
          --expiry $EXPIRY \
          --permissions acdlrw \
          --output tsv )
  
      $(Agent.ToolsDirectory)/azcopy/azcopy copy \
          "https://${STORE_NAME}.blob.core.windows.net/${CONTAINER_NAME}/${{ parameters.folder }}/?${SAS_TOKEN}" \
          "." --recursive --include-pattern "*c_*b.nc;left.nc;right.nc"

This is a screenshot of the different tasks of a pipeline that ran, which includes the times that it took to complete each task. image

To Reproduce Create a bunch of files on a storage container and download them.

Expected behavior

That az storage blob download-batch has a similar performance to azcopy.

Environment summary The Pipelines Agents run on Standard_E8s_v3 Azure VMs with the following cloud-init file:

#cloud-config
package_update: true
packages:
  - gcc
  - git-lfs
  - git
runcmd:
  - export MINICONDA_VERSION=4.8.2
  - export CONDA_VERSION=4.8.2
  - wget --quiet https://repo.continuum.io/miniconda/Miniconda3-py37_${MINICONDA_VERSION}-Linux-x86_64.sh
  - /bin/bash Miniconda3-py37_${MINICONDA_VERSION}-Linux-x86_64.sh -f -b -p /opt/conda
  - rm Miniconda3-py37_${MINICONDA_VERSION}-Linux-x86_64.sh
  - echo ". /opt/conda/etc/profile.d/conda.sh" >> /home/AzDevOps/.bashrc
  - /opt/conda/bin/conda config --system --prepend channels conda-forge
  - /opt/conda/bin/conda install python=3.8 netcdf4 numpy pandas xarray tqdm dask qcodes
  - chmod 777 -R /opt/conda
  - curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

When logging into an initialized VM:

a-banijh@banijh-vm00004I:/agent/_work/1/s$ az  --version
azure-cli                          2.9.0

command-modules-nspkg              2.0.3
core                               2.9.0
nspkg                              3.0.4
telemetry                          1.0.4

Python location '/opt/az/bin/python3'
Extensions directory '/home/a-banijh/.azure/cliextensions'

Python (Linux) 3.6.10 (default, Jul 10 2020, 07:17:28)
[GCC 7.5.0]

Legal docs and information: aka.ms/AzureCliLegal

Your CLI is up-to-date.

Additional context

Other people also report the same issue with az storage file download-batch on StackOverflow (I assume it’s using the same functions).

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
Juliehzlcommented, Aug 4, 2020

Hi @bashnijholt, for the issue we will recommend you to use az storage copy if you want to have better performance. As we know, Azcopy has very good performance and CLI also wants to utilize azcopy work due to limited bandwidth. Currently we are keeping working on azcopy integration and will do more with fixing issue https://github.com/Azure/azure-cli/issues/10741.

If you have any other concern, feel free to let me know.

0reactions
basnijholtcommented, Aug 3, 2020

How is this issue solved exactly? Could you please point me to the relevant PRs/commits?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting very slow speed while downloading storage blob.
I am getting very slow speed while downloading the storage blob from Azure storage account. I am using .net core 3.1 web api...
Read more >
How do I download an entire container from Azure storage ...
First, login with: az login. Then download the files you need: az storage blob download-batch -d . \ --pattern *.
Read more >
Blob Code download much slower than MS Azure Storage ...
This library is based on the core data movement framework that powers AzCopy, which also provides you high-performance uploading, downloading.
Read more >
Extremely slow azure blob upload - Reddit
I am trying to upload a big (200GB) file t Azure Blob store, using az storage bob upload The problem is that it...
Read more >
Azure Archives - File transfer tool from Limagito
Q&A 9: copy file into Azure storage container folder in LimagitoX File Mover ... CLI tools (e.g. az storage blob <command>), but extremely...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found