question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

azure.identity.ClientSecretCredential object can't be (cloud)pickled

See original GitHub issue
  • Package Name: azure.identity
  • Package Version: 1.7.1
  • Operating System: Ubuntu 20.04.3 LTS
  • Python Version: 3.9.7

Describe the bug An azure.identity.ClientSecretCredential object can’t be (cloud)pickled which makes it unusable in a multiprocessing context using e.g. Dask.

To Reproduce Steps to reproduce the behavior:

import os

from azure.identity import ClientSecretCredential
import cloudpickle  # version 2.0.0

credential = ClientSecretCredential(
    tenant_id=os.getenv('ADLFS_TENANT_ID'),  # with valid env variables set
    client_id=os.getenv('ADLFS_CLIENT_ID'),
    client_secret=os.getenv('ADLFS_CLIENT_SECRET'),
    authority='login.microsoftonline.com/')
cloudpickle.dumps(credential)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_31443/3356912537.py in <module>
      9     client_secret=os.getenv('ADLFS_CLIENT_SECRET'),
     10     authority='login.microsoftonline.com/')
---> 11 cloudpickle.dumps(credential)

/usr/local/continuum/miniconda3/envs/py39_ymp576_latest/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)
     71                 file, protocol=protocol, buffer_callback=buffer_callback
     72             )
---> 73             cp.dump(obj)
     74             return file.getvalue()
     75 

/usr/local/continuum/miniconda3/envs/py39_ymp576_latest/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
    600     def dump(self, obj):
    601         try:
--> 602             return Pickler.dump(self, obj)
    603         except RuntimeError as e:
    604             if "recursion" in e.args[0]:

TypeError: cannot pickle '_thread._local' object

Note that sometimes the error is TypeError: cannot pickle '_thread.RLock' object. This seem to be due to two different offending locks in the ClientSecretCredential object, i.e. _cache: _thread.RLock and _client: _thread._local.

Expected behavior I expect to be able to (cloud)pickle the ClientSecretCredential object such that it can be used in a multiprocessing setting.

Additional context In data science tasks, multiprocessing via e.g. Dask are often essential. Thus, I think it is absolutely necessary that you can pickle objects like ClientSecretCredential such that Azure plays nicely with the Scientific Python Ecosystem for doing data science.

Specifically, my use case is saving Xarray datasets in an Azure Data Lake, e.g. something like

import os

from azure.identity import ClientSecretCredential
from azure.storage.blob import ContainerClient
import dask.array
import numpy as np
import xarray as xr
import zarr

# Setup ADL connection
credential = ClientSecretCredential(
    tenant_id=os.getenv('ADLFS_TENANT_ID'),
    client_id=os.getenv('ADLFS_CLIENT_ID'),
    client_secret=os.getenv('ADLFS_CLIENT_SECRET'),
    authority='login.microsoftonline.com/')
container_client = ContainerClient(
    account_url='https://{my_storage_account}.blob.core.windows.net',
    container_name='{my_container}',
    credential=credential)
store = zarr.storage.ABSStore(
    client=container_client,
    prefix='test/')
group = 'ds.zarr'

# Create dummy data
ds = xr.Dataset(
    data_vars={"var_1": (('x'), dask.array.from_array(np.random.rand(10), chunks=2))},
    coords={'x': np.arange(10)})

# Save xarray to Zarr in ADL
ds.to_zarr(store=store, group=group, compute=False).compute(scheduler='processes')  # Compute using processes (not threads)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_33027/4096524395.py in <module>
     28 
     29 # Save xarray to Zarr in ADL
---> 30 ds.to_zarr(store=store, group=group, compute=False).compute(scheduler='processes')  # Compute using processes (not threads)

/usr/local/continuum/miniconda3/envs/py39_ymp576_latest/lib/python3.9/site-packages/dask/base.py in compute(self, **kwargs)
    286         dask.base.compute
    287         """
--> 288         (result,) = compute(self, traverse=False, **kwargs)
    289         return result
    290 

/usr/local/continuum/miniconda3/envs/py39_ymp576_latest/lib/python3.9/site-packages/dask/base.py in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    569         postcomputes.append(x.__dask_postcompute__())
    570 
--> 571     results = schedule(dsk, keys, **kwargs)
    572     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    573 

/usr/local/continuum/miniconda3/envs/py39_ymp576_latest/lib/python3.9/site-packages/dask/multiprocessing.py in get(dsk, keys, num_workers, func_loads, func_dumps, optimize_graph, pool, chunksize, **kwargs)
    217     try:
    218         # Run
--> 219         result = get_async(
    220             pool.submit,
    221             pool._max_workers,

/usr/local/continuum/miniconda3/envs/py39_ymp576_latest/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    493             # Main loop, wait on tasks to finish, insert new ones
    494             while state["waiting"] or state["ready"] or state["running"]:
--> 495                 fire_tasks(chunksize)
    496                 for key, res_info, failed in queue_get(queue).result():
    497                     if failed:

/usr/local/continuum/miniconda3/envs/py39_ymp576_latest/lib/python3.9/site-packages/dask/local.py in fire_tasks(chunksize)
    475                         (
    476                             key,
--> 477                             dumps((dsk[key], data)),
    478                             dumps,
    479                             loads,

/usr/local/continuum/miniconda3/envs/py39_ymp576_latest/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)
     71                 file, protocol=protocol, buffer_callback=buffer_callback
     72             )
---> 73             cp.dump(obj)
     74             return file.getvalue()
     75 

/usr/local/continuum/miniconda3/envs/py39_ymp576_latest/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
    600     def dump(self, obj):
    601         try:
--> 602             return Pickler.dump(self, obj)
    603         except RuntimeError as e:
    604             if "recursion" in e.args[0]:

TypeError: cannot pickle '_thread._local' object

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:5
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
Chroxvicommented, Dec 13, 2021

A workaround to this problem is to delay the instantiation of the ClientSecretCredential object until it is actually needed. That way we can simply pass around the client/secret (which are simple strings that can be pickled) in the dask cluster.

Implementing this workaround for my use case is not trivial though, as the current interface to the Xarray/Zarr setup requires an instantiated credential object. However, I have come up with this very ugly ProxyCredential class that allows me to pretend to have an instantiated credential object but really just instantiate the ClientSecretCredential when needed. This obviously comes with some overhead, but it doesn’t seem too bad to be useful.

class ProxyCredential:
    """
    Wrap ClientSecretCredential to delay its instantiation such that it may be
    pickled.
    """
    def __init__(self, credential_kwargs):
        """
        Parameters
        ----------
        crendential_kwargs : dict
            The tenant_id, client_id, and client_secret to use with the
            ClientSecretCredential.
        """
        self.credential_kwargs = credential_kwargs

    def __getattr__(self, name):
        """Intercept all attribute access and method calls."""
        credential_kwargs = super().__getattribute__('credential_kwargs')
        credential = ClientSecretCredential(
            authority='login.microsoftonline.com/',
            **credential_kwargs)
        attr = credential.__getattribute__(name)
        if hasattr(attr, '__call__'):
            # If attribute is a method, wrap it
            def wrapper(*args, **kwargs):
                # Create a new ClientSecretObject in wrapper
                # to further delay its instantiation
                credential = ClientSecretCredential(
                   authority='login.microsoftonline.com/',
                   **credential_kwargs)
                method = credential.__getattribute__(name)
                return method(*args, **kwargs)
            return wrapper
        else:
            return attr

Now using the ProxyCredential, my use case works as expected, i.e. the following examples now works:

credential_kwargs = {
    'tenant_id': os.getenv('ADLFS_TENANT_ID'),
    'client_id': os.getenv('ADLFS_CLIENT_ID'),
    'client_secret': os.getenv('ADLFS_CLIENT_SECRET')}
credential = ProxyCredential(credential_kwargs)
container_client = ContainerClient(
    account_url='https://{my_storage_account}.blob.core.windows.net',
    container_name='{my_container}',
    credential=credential)

store = zarr.storage.ABSStore(
    client=container_client,
    prefix='test/')
group = 'ds.zarr'

# Create dummy data
ds = xr.Dataset(
    data_vars={"var_1": (('x'), dask.array.from_array(np.random.rand(10), chunks=2))},
    coords={'x': np.arange(10)})

# Save xarray to Zarr in ADL
ds.to_zarr(store=store, group=group, compute=False).compute(scheduler='processes')
1reaction
Chroxvicommented, May 30, 2022

Doing a /unresolved to get some feedback on @dhirschfeld suggestion for making ManagedIdentityCredential picklable which seems very reasonable to me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

azure - ' ClientSecretCredential ' object has no attribute ...
I have two account linked using Az Lighthouse, I tested remote Account Storage Account modification using python code (azure-identity), from my ...
Read more >
dask typeerror: cannot pickle '_thread.rlock' object - You.com
Describe the bug An azure.identity.ClientSecretCredential object can't be (cloud)pickled which makes it unusable in a multiprocessing context using e.g. ...
Read more >
azure.identity.DefaultAzureCredential class - Microsoft Learn
A default credential capable of handling most Azure SDK authentication scenarios. The identity it uses depends on the environment.
Read more >
The ultimate guide to Python pickle - Snyk
Pickle can serialize a Python object into a flat byte stream ... However, this also means that pickle can't be used to exchange...
Read more >
pickle — Python object serialization — Python 3.11.1 ...
The pickle module keeps track of the objects it has already serialized, so that later references to the same object won't be serialized...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found