Using .str functions
See original GitHub issueI have tried, perhaps incorrectly, to convert my column to pyarrow string type as follows:
fletcher_string_dtype = fr.FletcherDtype(pa.string())
df['string_col'] = df.string_col.astype(fletcher_string_type)
But now I can’t do string functions on it because I get the error message AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
Specifically, I’m trying to do .str.contains()
I may be casting column incorrectly. It may be that there’s no value in using fletcher for this.
I saw in your talk, groupby
was a nice use case. Related to this question is what are the best use cases for this dtype - just a link to some additional reading material would be great.
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (2 by maintainers)
Top Results From Across the Web
C - Strings and String functions with examples - BeginnersBook
C String function – strcmp ... It compares the two strings and returns an integer value. If both the strings are same (equal)...
Read more >String Functions in C - Scaler Topics
Strings are an array of characters that terminate with a null character '\0'. The difference between a character array and a string is...
Read more >String Manipulations In C Programming Using Library Functions
In this article, you'll learn to manipulate strings in C using library functions such as gets(), puts, strlen() and more. You'll learn to...
Read more >Commonly used String functions in C/C++ with Examples
strrchr: In C/C++, strrchr() is a predefined function used for string handling. cstring is the header file required for string functions.
Read more >String - JavaScript - MDN Web Docs
Some of the most-used operations on strings are to check their length , to build and concatenate them using the + and +=...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
That’s what I’m suggesting.
Unless you’re suggesting that the pandas default implementation will work directly with arrow data (in fletcher arrays) I’d disagree with this position - I don’t want to be forced to coerce my arrow data to pandas to do basic manipulations and I also don’t want the maintenance burden of 2 separate implementations.
I think pandas should make both
.str
and.dt
available to be overridden by different (extension) dtypes with implementations that work / make sense / are performant for that data type.The concept is similar to numpy’s
__array_function__
protocol whereby different array implementations can override the default numpy implementation thereby allowing users to write generic code that works for numpy arrays, cupy arrays, sparse arrays, etc…I’d like my transform functions to work seamlessly with either python/pandas strings or with arrow/fletcher strings. Of course, I don’t know if this may be an unreasonable hope given technical constraints but I think it’s something worth striving for with the benefits similar to that provided by numpy’s NEP-18.