question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[ENH] Keeping a history of every operation applied to a DataFrame

See original GitHub issue

A whacky idea dreamed up during the sprint.

This would essentially record for you your data processing pipeline (on the DataFrame bit at least). Metadata could go in a .pj accessor in the dataframe / series.

Fun challenges include:

  • Detecting use of all Pandas functions, not just PyJanitor ones
  • Minimizing modification of Pandas objects as much as possible
  • Avoiding brittleness to pandas code updates
  • Handling multi-dataframe operations without losing computation metadata

Thoughts on how this could be possible would be nice. Dream big.

@szuckerman @ericmjl @HectorM14

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
jcmkk3commented, Jul 15, 2019

This is not exactly the same thing, but since you’re already thinking about a scope outside of just pyjanitor added methods, it would be really cool to see something like tidylog for Pandas.

1reaction
ericmjlcommented, Dec 17, 2019

Friends, I think @eyaltrabelsi, a contributor to pyjanitor, has done some wonderful work with pandas-log related to this issue! The programming model is very good: context manager so that we can selectively do logging at the appropriate places.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[ENH] Keeping a history of every operation applied to ... - GitHub
A whacky idea dreamed up during the sprint. This would essentially record for you your data processing pipeline (on the DataFrame bit at ......
Read more >
Apply function to every row in a Pandas DataFrame
One can use apply() function in order to apply function to every row in given dataframe. ... dataframe and storing result in a...
Read more >
Essential basic functionality — pandas 1.5.2 documentation
Passing a dictionary of column names to a scalar or a list of scalars, to DataFrame.agg allows you to customize which functions are...
Read more >
WORKSHEET – Data Handling Using Pandas
It displays the names of columns of the Dataframe. 2. It will display all columns except the last 5 columns. Page 2. www....
Read more >
A graphical data analysis tool for dataset enhancement and ...
B.2 The widget used to export a dataframe in CSV file . ... Operation log: DataMole keeps a trace of every applied operation,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found