Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue with to_long_format()

See original GitHub issue

I have a dataset which looks like this:

transforming it into the long format with to_long_format(data, duration_col='time2') produces this:

Note how the start column has zeros. Either there is an issue here or I misunderstood how this function works.

Issue Analytics

State:
Created 4 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

CamDavidsonPiloncommented, Aug 21, 2019

You’re right, this is confusing. The purpose of to_long_format needs to be expanded, gimme a few hours to make this better.

0reactions

sursucommented, Aug 21, 2019

OK, my case can be covered like this:

data.sort_values(['id','time2'], inplace=True)
data.rename(columns={'time2':'stop'}, inplace=True)
data['start'] = data.groupby('id')['stop'].shift().fillna(0)
cols = [col for col in  data.columns if col != 'stop'] + ['stop']
data = data[cols]

The example you have mentioned above can be solved using these steps:

dd = pd.merge(df, cv, how='left', on='id')
dd['start'] = dd.groupby('id')['t_y'].shift().fillna(0)
dd['stop'] = dd[['t_x', 't_y']].max(axis=1)
dd.loc[set(dd.index) - set(dd.groupby('id')['E'].tail(1).index),'E'] = 0
dd.drop(['t_x','t_y'], axis=1)

But you need to check this. For instance, I have assumed that the time in df cannot be lower than the one in cv. Also, proper sorting needs to be done in case cv is given unsorted.