RFC: variable expressions in Plot
See original GitHub issueI would like to add functionality to assign variables using an expression that has the source data as a namespace. This will provide functionality similar to R’s nonstandard evaluation. Because Python does not have this concept, the space of possible approaches is less magical than what you can get in R (although it also avoids all of the complications that nonstandard evaluation introduces).
There’s a few options here and I’m undecided as to what would be best:
As a motivating example, say we want to use the tip rate (tip / total_bill
) in the tips
dataset.
1. Function that accepts a dataframe and returns a series
so.Plot(tips, x=lambda d: d["tip"] / d["total_bill"])
(or in some cases)
so.Plot(tips, x=lambda d: d.tip / d.total_bill)
(+) Explicit and relatively easy to explain
(+) Can pass a closure over other objects in the outer scope
(+) Doesn’t assume that data
is a pandas.DataFrame
(–) Repetitive and a little clumsy
(–) Hard to get a nicely-formatted name (i.e. for axis labels)
(–) Would be impossible to serialize
2. Custom object that wraps an expression passed to DataFrame.eval
so.Plot(tips, x=so.Expr("tip / total_bill"))
(+) Less repetitive (data
is implicit)
(+) Can get a nice name
(+) Could be serialized
(–) Introduces a new type of seaborn object that’s a little hard to explain
(–) Somewhat verbose
(–) Programming in strings means linters won’t work
3. Lambda that returns an expression passed to DataFrame.eval
so.Plot(tips, x=lambda: "tip / total_bill")
(+) Least verbose (+) Can get a nice name (+) Could support serialization with some extra internal handling (–) Abuses the purpose of lambdas and may be confusing (–) Programming in strings means linters won’t work
Issue Analytics
- State:
- Created a year ago
- Reactions:4
- Comments:13 (7 by maintainers)
Top GitHub Comments
From a usability perspective I’d favour either 1 or 3. But 1 is verbose enough that I’m more likely to do that transform directly on the DF and then pass it to the plotting func. So I’d say 3 offers something uniquely appealing in its brevity and ‘magic’.
Yeah. Being able to pass in the columns as arguments to the lambda is mostly helpful if you’re able to use short variable names to write what feels like more mathematical formulas. It is especially useful if you’re reusing the same variable multiple times in the formula. It could always be something that could be a helper to create a function compatible with the 1st option, anyway.
An example of where something like that would come in handy would be the
skew
calculation in the arquero example below. Mind you, that is just destructing syntax in javascript and isn’t something that was created specifically for arquero.