[ENH] Keeping a history of every operation applied to a DataFrame
See original GitHub issueA whacky idea dreamed up during the sprint.
This would essentially record for you your data processing pipeline (on the DataFrame bit at least). Metadata could go in a .pj
accessor in the dataframe / series.
Fun challenges include:
- Detecting use of all Pandas functions, not just PyJanitor ones
- Minimizing modification of Pandas objects as much as possible
- Avoiding brittleness to pandas code updates
- Handling multi-dataframe operations without losing computation metadata
Thoughts on how this could be possible would be nice. Dream big.
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
[ENH] Keeping a history of every operation applied to ... - GitHub
A whacky idea dreamed up during the sprint. This would essentially record for you your data processing pipeline (on the DataFrame bit at ......
Read more >Apply function to every row in a Pandas DataFrame
One can use apply() function in order to apply function to every row in given dataframe. ... dataframe and storing result in a...
Read more >Essential basic functionality — pandas 1.5.2 documentation
Passing a dictionary of column names to a scalar or a list of scalars, to DataFrame.agg allows you to customize which functions are...
Read more >WORKSHEET – Data Handling Using Pandas
It displays the names of columns of the Dataframe. 2. It will display all columns except the last 5 columns. Page 2. www....
Read more >A graphical data analysis tool for dataset enhancement and ...
B.2 The widget used to export a dataframe in CSV file . ... Operation log: DataMole keeps a trace of every applied operation,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
This is not exactly the same thing, but since you’re already thinking about a scope outside of just pyjanitor added methods, it would be really cool to see something like tidylog for Pandas.
Friends, I think @eyaltrabelsi, a contributor to pyjanitor, has done some wonderful work with pandas-log related to this issue! The programming model is very good: context manager so that we can selectively do logging at the appropriate places.