Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: variable expressions in Plot

See original GitHub issue

I would like to add functionality to assign variables using an expression that has the source data as a namespace. This will provide functionality similar to R’s nonstandard evaluation. Because Python does not have this concept, the space of possible approaches is less magical than what you can get in R (although it also avoids all of the complications that nonstandard evaluation introduces).

There’s a few options here and I’m undecided as to what would be best:

As a motivating example, say we want to use the tip rate (tip / total_bill) in the tips dataset.

1. Function that accepts a dataframe and returns a series

so.Plot(tips, x=lambda d: d["tip"] / d["total_bill"])

(or in some cases)

so.Plot(tips, x=lambda d: d.tip / d.total_bill)

(+) Explicit and relatively easy to explain (+) Can pass a closure over other objects in the outer scope (+) Doesn’t assume that data is a pandas.DataFrame (–) Repetitive and a little clumsy (–) Hard to get a nicely-formatted name (i.e. for axis labels) (–) Would be impossible to serialize

2. Custom object that wraps an expression passed to `DataFrame.eval`

so.Plot(tips, x=so.Expr("tip / total_bill"))

(+) Less repetitive (data is implicit) (+) Can get a nice name (+) Could be serialized (–) Introduces a new type of seaborn object that’s a little hard to explain (–) Somewhat verbose (–) Programming in strings means linters won’t work

3. Lambda that returns an expression passed to `DataFrame.eval`

so.Plot(tips, x=lambda: "tip / total_bill")

(+) Least verbose (+) Can get a nice name (+) Could support serialization with some extra internal handling (–) Abuses the purpose of lambdas and may be confusing (–) Programming in strings means linters won’t work

Issue Analytics

State:
Created a year ago
Reactions:4
Comments:13 (7 by maintainers)

Top GitHub Comments

3reactions

philsheardcommented, Oct 22, 2022

From a usability perspective I’d favour either 1 or 3. But 1 is verbose enough that I’m more likely to do that transform directly on the DF and then pass it to the plotting func. So I’d say 3 offers something uniquely appealing in its brevity and ‘magic’.

1reaction

jcmkk3commented, Oct 22, 2022

But when writing up this issue was having trouble articulating the case for this over option 1.

Yeah. Being able to pass in the columns as arguments to the lambda is mostly helpful if you’re able to use short variable names to write what feels like more mathematical formulas. It is especially useful if you’re reusing the same variable multiple times in the formula. It could always be something that could be a helper to create a function compatible with the 1st option, anyway.

An example of where something like that would come in handy would be the skew calculation in the arquero example below. Mind you, that is just destructing syntax in javascript and isn’t something that was created specifically for arquero.

// Reshape (fold) the data to a two column layout: city, sun.
dt.fold(aq.all(), { as: ['city', 'sun'] })
  .groupby('city')
  .rollup({
    min:  d => op.min(d.sun), // functional form of op.min('sun')
    max:  d => op.max(d.sun),
    avg:  d => op.average(d.sun),
    med:  d => op.median(d.sun),
    // functional forms permit flexible table expressions
    skew: ({sun: s}) => (op.mean(s) - op.median(s)) / op.stdev(s) || 0
  })
  .objects()

Top Results From Across the Web

RFC 525 - IETF

For example, to get a plot on sin(x), the "program" II REAL SIN x DISPLAY ... 2] RFC 525 MIT-MATHLAB MEETS UCSB-OLS 1...

✨ RFC - Never Type initiative

Similarly, they are the logical type for expressions that never return to ... We construct a directed graph where the vertices are inference...

RFC 1019: Report of the Workshop on Environments for ...

Alan Katz's recent essay, "Issues in Defining an Equations Representation Standard", RFC-1003, DDN Network Information Center, March 1987 (reprinted in the ...

Engineering Math | ShareTechnote

Fortunately most of calculus concept for single variable can be represented in a form of graph or illustration. First let's look at the...

Math Notation for R Plot Titles: expression and bquote

In this post you will learn: How to create expressions that have mixed (1) strings, (2) expressions, & (3) numbers How to pass...