question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API: is_string_dtype is not strict

See original GitHub issue

https://github.com/pandas-dev/pandas/issues/15533#issuecomment-284270591

pandas.types.common.is_string_dtype is not strict, just checking if its a fixed-width numpy string/unicode type or object dtype. It could be made strict via something like this.

def is_string_dtype(arr_or_dtype):
    dtype = _get_dtype(arr_or_dtype)
    if isinstance(arr_or_dtype, np.ndarray):
        if not lib.infer_dtype(arr_or_dtype) in ['string', 'unicode']:
            return False

    return dtype.kind in ('O', 'S', 'U') and not is_period_dtype(dtype)

this would need performance checking to see if its a problem. Further this then changes the API a tiny bit (as we allow an object OR a dtype to be passed in). Which is probably ok as long as its documented a bit.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ivirshupcommented, Feb 6, 2019

Any updates on this? It seems like behavior changed in v0.24.0, but not necessarily for the better.

import pandas as pd
from pandas.api.types import is_string_dtype

is_string_dtype(pd.Series(pd.Categorical([1,2,3])))
# False in v0.23.4
# True in v0.24.1

# Still:
is_string_dtype(pd.Series([(0,1), (1,1)]))
# True in v0.23.4
# True in v0.24.1

This was causing issues in a project where we were expecting the results of this check not to change between versions. On further investigation, it seems like this function is unreliable (tuples aren’t strings, categoricals as strings feels very R).

0reactions
jrebackcommented, Mar 5, 2017

@ResidentMario you shouldn’t need this fyi

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.api.types.is_string_dtype
The array or dtype to check. Returns. boolean. Whether or not the array or dtype is of the string dtype. Examples.
Read more >
Caveats while checking dtype in pandas DataFrame
The reason this check is not strict is because there isn't a dedicated string dtype in pandas , and instead strings are just...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found