DISCUSSION: Add format parameter to .astype when converting to str dtype
See original GitHub issueI propose adding a string formatting possibility to .astype when converting to str dtype: I think it’s reasonable to expect that you can choose the string format when converting to a string dtype, as you’re basically freezing a representation of your series, and just using .astype(str) for this is often too crude.
This possibility should take shape of a format parameter to .astype, that can take a string and can only be used when converting to string dtype. This would lessen the reliance on .apply for converting non-strings to more complex strings and make such conversions more readable (IMO) and maybe faster (as we’re avoiding .apply which is slow, though Im not too knowledgable on such optimizations).
The current procedure for converting to a complex string goes like this:
In [1] ser = pd.Series([-1, 1.234])
In [2] ser.apply("{:+.1f} $".format)
0 -1.0 $
1 +1.2 $
dtype: object
I propose to make this possible:
In [3] ser.astype(str, format="{:+.1f} $")
0 -1.0 $
1 +1.2 $
dtype: object
If the dtype parameter is not str, setting of the format parameter should raise an exception. If format is not set, the current behaviour will be used. The proposed change is therefore backward compatible.
Also to consider:
Allowing a placeholder name
Should a placeholder name be available? Then you could do:
In [4] ser = pd.Series(pd.date_range('2017-03', periods=2, freq='M'))
In [x] ser.astype(str, format="Y{value.dt.year}-Q{value.dt.quarter}")
0 Y2017-Q1
1 Y2017-Q2
dtype: object
(Note that we above have an implicit parameter on .astype with a default value “value”, so adding a placeholder name is transparent. Note also the above behaviour is present in ser.dt.strftime, but please look at the principle rather than the concrete example).
A downside to allowing a placeholder name could be the potential for abuse (stuffing too much into the format string) and possibly losing the option to vectorize (though this is not my expertize).
Adding a .format method
It could also be considered adding a .str.format or .format method to DataFrame/Series.
If .format is added to the .str namespace it would only be usable for string dataframes/series (which I’d be quite ok with, if the format parameter is also available on .astype for other data types).
Alternatively, such a method could be available directly on all DataFrames/Series. Then you’d do ser.format('{:+.1f}') rather than ser.astype(str, format='{:+.1f}'). IMO though, it would be inconsistent to have such a string conversion method directly on pandas objects, but not for other types. Why have .format but not .to_numeric as a dataframes/series method?
IMO therefore, astype(str, format=...) combined with a .str.format method is better than adding a new .format method for this. So:
.astype(str, format=...)makes it very obvious that we’re now changing to string datatype, and.str.format(...)makes it clear that we’re doing a string manipulation.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:6
- Comments:19 (18 by maintainers)

Top Related StackOverflow Question
ok, given the discussion we are having on #18347. more amenable to this.
This is something completely different. This converts the full dataframe to a string represenation, while here it is about converting values to formatted string values inside a dataframe
I like having some way to do this (but the question is indeed in what kind of API), but I would also be OK to end the discussion with the decision that it is not important enough to add specialized functionality and that using the
s.apply("{..} ..".format)idiom is the recommended way here. But let’s at least have that discussion.