question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Series.str.decode() turns arrays of strings to NaN and fails on byte strings

See original GitHub issue

Hello,

this looks like a bug:

x = np.array(['x','y'])
pd.Series(x).str.decode(encoding='UTF-8',errors='strict')
0   NaN
1   NaN
dtype: float64

the line above is also used in pytables.py:

data = Series(data).str.decode(encoding, errors=errors).values

… and leads to an error when reading hdf-files written with pandas <0.23 In some cases the text data was stored as byte string (i.e. ‘x = np.array([b’x’, b’y’]) ) which raises the following:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:5
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
Jywschercommented, Sep 25, 2018

Hello, is there any update on the issue? Since i ran into the same problem it would be great if someone could help to solve it…

2reactions
abrakababracommented, Sep 22, 2018

finally done a little research, it’s taking the “except-route” here:

def _map(f, arr, na_mask=False, na_value=np.nan, dtype=object):
    if not len(arr):
        return np.ndarray(0, dtype=dtype)

    if isinstance(arr, ABCSeries):
        arr = arr.values
    if not isinstance(arr, np.ndarray):
        arr = np.asarray(arr, dtype=object)
    if na_mask:
        mask = isna(arr)
        try:
            convert = not all(mask)
            result = lib.map_infer_mask(arr, f, mask.view(np.uint8), convert)
--->  except (TypeError, AttributeError) as e:**
            # Reraise the exception if callable `f` got wrong number of args.
            # The user may want to be warned by this, instead of getting NaN
            if compat.PY2:
                p_err = r'takes (no|(exactly|at (least|most)) ?\d+) arguments?'
            else:
                p_err = (r'((takes)|(missing)) (?(2)from \d+ to )?\d+ '
                         r'(?(3)required )positional arguments?')

            if len(e.args) >= 1 and re.search(p_err, e.args[0]):
                raise e

            def g(x):
                try:
                    return f(x)
------->   except (TypeError, AttributeError):
                    return na_value

            return _map(g, arr, dtype=dtype)
        if na_value is not np.nan:
            np.putmask(result, mask, na_value)
            if result.dtype == object:
                result = lib.maybe_convert_objects(result)
        return result
    else:
return lib.map_infer(arr, f)

the first except is triggered by lib.map_infer_maskwhich in my installation is this file \python3\lib\site-packages\pandas_libs\lib.cp37-win_amd64.pyd

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to translate "bytes" objects into literal strings in pandas ...
You can use vectorised str.decode to decode byte strings into ordinary strings: df['COLUMN1'].str.decode("utf-8"). To do this for multiple ...
Read more >
Why does .decode() result in NaN values for byte strings that ...
I have a MS SQL database that I want to query with Python. I use the following snippet: cnxn = pypyodbc.connect("Driver={SQL Server Native...
Read more >
Python | Pandas Series.str.decode() - GeeksforGeeks
str.decode() function to decode the character strings in the underlying data of the given series object. Use 'UTF-8' encoding method.
Read more >
Convert Bytes to String [Python] - YouTube
Quick Sol: The decode () method allows us to convert a byte string object (encoded in a certain format) to a simple string...
Read more >
Built-in Types — Python 3.11.1 documentation
... to a string (with the repr() function or the slightly different str() ... not a == b is interpreted as not (a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found