question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DISCUSSION: Add format parameter to .astype when converting to str dtype

See original GitHub issue

I propose adding a string formatting possibility to .astype when converting to str dtype: I think it’s reasonable to expect that you can choose the string format when converting to a string dtype, as you’re basically freezing a representation of your series, and just using .astype(str) for this is often too crude.

This possibility should take shape of a format parameter to .astype, that can take a string and can only be used when converting to string dtype. This would lessen the reliance on .apply for converting non-strings to more complex strings and make such conversions more readable (IMO) and maybe faster (as we’re avoiding .apply which is slow, though Im not too knowledgable on such optimizations).

The current procedure for converting to a complex string goes like this:

In [1] ser = pd.Series([-1, 1.234])
In [2] ser.apply("{:+.1f} $".format)
0    -1.0 $
1    +1.2 $
dtype: object

I propose to make this possible:

In [3] ser.astype(str, format="{:+.1f} $")
0    -1.0 $
1    +1.2 $
dtype: object

If the dtype parameter is not str, setting of the format parameter should raise an exception. If format is not set, the current behaviour will be used. The proposed change is therefore backward compatible.

Also to consider:

Allowing a placeholder name

Should a placeholder name be available? Then you could do:

In [4] ser = pd.Series(pd.date_range('2017-03', periods=2, freq='M'))
In [x] ser.astype(str, format="Y{value.dt.year}-Q{value.dt.quarter}")
0    Y2017-Q1
1    Y2017-Q2
dtype: object

(Note that we above have an implicit parameter on .astype with a default value “value”, so adding a placeholder name is transparent. Note also the above behaviour is present in ser.dt.strftime, but please look at the principle rather than the concrete example).

A downside to allowing a placeholder name could be the potential for abuse (stuffing too much into the format string) and possibly losing the option to vectorize (though this is not my expertize).

Adding a .format method

It could also be considered adding a .str.format or .format method to DataFrame/Series.

If .format is added to the .str namespace it would only be usable for string dataframes/series (which I’d be quite ok with, if the format parameter is also available on .astype for other data types).

Alternatively, such a method could be available directly on all DataFrames/Series. Then you’d do ser.format('{:+.1f}') rather than ser.astype(str, format='{:+.1f}'). IMO though, it would be inconsistent to have such a string conversion method directly on pandas objects, but not for other types. Why have .format but not .to_numeric as a dataframes/series method?

IMO therefore, astype(str, format=...) combined with a .str.format method is better than adding a new .format method for this. So:

  • .astype(str, format=...) makes it very obvious that we’re now changing to string datatype, and
  • .str.format(...) makes it clear that we’re doing a string manipulation.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:6
  • Comments:19 (18 by maintainers)

github_iconTop GitHub Comments

1reaction
jrebackcommented, Nov 26, 2017

ok, given the discussion we are having on #18347. more amenable to this.

1reaction
jorisvandenbosschecommented, Aug 16, 2017

We already have .to_string(…)

This is something completely different. This converts the full dataframe to a string represenation, while here it is about converting values to formatted string values inside a dataframe

I like having some way to do this (but the question is indeed in what kind of API), but I would also be OK to end the discussion with the decision that it is not important enough to add specialized functionality and that using the s.apply("{..} ..".format) idiom is the recommended way here. But let’s at least have that discussion.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python | Pandas Series.astype() to convert Data type of series
Syntax: DataFrame.astype(dtype, copy=True, errors='raise'). Parameters: dtype: Data type to convert the series into. (for example str, float ...
Read more >
How do I convert strings in a Pandas data frame to a 'date ...
The issue with this answer is that it converts the column to dtype = object which takes up considerably more memory than a...
Read more >
10 tricks for converting Data to a Numeric Type in Pandas
1. Converting string to int/float ... Similarly, if we want to convert the data type to float, we can call astype('float') . By...
Read more >
Overview of Pandas Data Types - Practical Business Python
This article will discuss the basic pandas data types (aka dtypes ) ... def convert_currency(val): """ Convert the string number value to a ......
Read more >
Data Types and Formats – Data Analysis and Visualization in ...
The format of individual columns and rows will impact analysis performed on a ... calculations on the string-formatted numeric data, you get an...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found