question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adding a column to a DataFrame always creates a copy of a Series

See original GitHub issue

I don’t know if this is a valid behaviour, but it seems to me like a bug?

>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame(s)
>>> df.index is s.index
True
>>> df.iloc[0, 0] = 33
>>> df
    0
0  33
1   2
2   3
>>> s
0    33
1     2
2     3
dtype: int64

So far so good.

But if I do:

>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame(index=s.index)
>>> df[0] = s
>>> df.index is s.index
True
>>> df.iloc[0, 0] = 33
>>> df
    0
0  33
1   2
2   3
>>> s
0    1
1    2
2    3
dtype: int64

Basically there is no way to add a column to the DataFrame without creating a copy of the data. This seems like a suboptimal behaviour since the operation:

df['c'] = df['a'] + df['b']

First create a Series object in the memory, and then create a copy of it that get’s assigned to the DataFrame column c.

I also understand why this can be a desired behaviour, so maybe this issue could be reformulated into a question: Is there a way to add a column to a DataFrame without creating a copy of the data.

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Comments:15 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
charlescbeebecommented, Nov 1, 2018

+1 for deprecating copy from the public API – I also suggest that the special behavior of the assignment operator be prominently announced in the relevant docstrings and in 10 Minutes to pandas. You have to dig around quite a bit in the source to figure out that:

df['x'] = df['y']

is actually:

from copy import deepcopy
df['x'] = deepcopy(df['y'])

or (because of the redundancy in the public API):

df['x'] = df['y'].copy()

While I appreciate the argument that this case is special enough to break with the expected behavior of the language in which you’ve chosen to implement this library because the core devs perceive it as the default use case in this context, it is not such an obvious change that it’s reasonable to leave people to figure it out on their own.

0reactions
jrebackcommented, Aug 19, 2013

its only possible to not copy in very limited circumstances (which IMHO are not necessary anyhow) so go ahead and close

Read more comments on GitHub >

github_iconTop Results From Across the Web

Does adding column to a DataFrame involve copying data?
I think from my experiments that loc is slowier and align new Series with different index the slowiest: But I have no idea...
Read more >
Pandas Add Constant Column to DataFrame
In pandas you can add a new constant column with a literal value to DataFrame using assign() method, this method returns a new...
Read more >
How To Add A New Column To An Existing Pandas DataFrame
First, let's create an example DataFrame that we'll reference throughout this guide to demonstrate a few concepts related to adding columns ...
Read more >
Views and Copies in pandas - Practical Data Science
Since pandas Series and DataFrames are backed by numpy arrays, ... it—find where you may have created a view or may have created...
Read more >
pandas.Series.copy — pandas 1.5.2 documentation
When deep=True (default), a new object will be created with a copy of the calling object's data and indices. Modifications to the data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found