Adding a column to a DataFrame always creates a copy of a Series
See original GitHub issueI don’t know if this is a valid behaviour, but it seems to me like a bug?
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame(s)
>>> df.index is s.index
True
>>> df.iloc[0, 0] = 33
>>> df
0
0 33
1 2
2 3
>>> s
0 33
1 2
2 3
dtype: int64
So far so good.
But if I do:
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame(index=s.index)
>>> df[0] = s
>>> df.index is s.index
True
>>> df.iloc[0, 0] = 33
>>> df
0
0 33
1 2
2 3
>>> s
0 1
1 2
2 3
dtype: int64
Basically there is no way to add a column to the DataFrame without creating a copy of the data. This seems like a suboptimal behaviour since the operation:
df['c'] = df['a'] + df['b']
First create a Series object in the memory, and then create a copy of it that get’s assigned to the DataFrame column c
.
I also understand why this can be a desired behaviour, so maybe this issue could be reformulated into a question: Is there a way to add a column to a DataFrame without creating a copy of the data.
Issue Analytics
- State:
- Created 10 years ago
- Comments:15 (14 by maintainers)
Top Results From Across the Web
Does adding column to a DataFrame involve copying data?
I think from my experiments that loc is slowier and align new Series with different index the slowiest: But I have no idea...
Read more >Pandas Add Constant Column to DataFrame
In pandas you can add a new constant column with a literal value to DataFrame using assign() method, this method returns a new...
Read more >How To Add A New Column To An Existing Pandas DataFrame
First, let's create an example DataFrame that we'll reference throughout this guide to demonstrate a few concepts related to adding columns ...
Read more >Views and Copies in pandas - Practical Data Science
Since pandas Series and DataFrames are backed by numpy arrays, ... it—find where you may have created a view or may have created...
Read more >pandas.Series.copy — pandas 1.5.2 documentation
When deep=True (default), a new object will be created with a copy of the calling object's data and indices. Modifications to the data...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
+1 for deprecating
copy
from the public API – I also suggest that the special behavior of the assignment operator be prominently announced in the relevant docstrings and in 10 Minutes to pandas. You have to dig around quite a bit in the source to figure out that:df['x'] = df['y']
is actually:
or (because of the redundancy in the public API):
While I appreciate the argument that this case is special enough to break with the expected behavior of the language in which you’ve chosen to implement this library because the core devs perceive it as the default use case in this context, it is not such an obvious change that it’s reasonable to leave people to figure it out on their own.
its only possible to not copy in very limited circumstances (which IMHO are not necessary anyhow) so go ahead and close