question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: variable expressions in Plot

See original GitHub issue

I would like to add functionality to assign variables using an expression that has the source data as a namespace. This will provide functionality similar to R’s nonstandard evaluation. Because Python does not have this concept, the space of possible approaches is less magical than what you can get in R (although it also avoids all of the complications that nonstandard evaluation introduces).

There’s a few options here and I’m undecided as to what would be best:

As a motivating example, say we want to use the tip rate (tip / total_bill) in the tips dataset.

1. Function that accepts a dataframe and returns a series

so.Plot(tips, x=lambda d: d["tip"] / d["total_bill"])

(or in some cases)

so.Plot(tips, x=lambda d: d.tip / d.total_bill)

(+) Explicit and relatively easy to explain (+) Can pass a closure over other objects in the outer scope (+) Doesn’t assume that data is a pandas.DataFrame (–) Repetitive and a little clumsy (–) Hard to get a nicely-formatted name (i.e. for axis labels) (–) Would be impossible to serialize

2. Custom object that wraps an expression passed to DataFrame.eval

so.Plot(tips, x=so.Expr("tip / total_bill"))

(+) Less repetitive (data is implicit) (+) Can get a nice name (+) Could be serialized (–) Introduces a new type of seaborn object that’s a little hard to explain (–) Somewhat verbose (–) Programming in strings means linters won’t work

3. Lambda that returns an expression passed to DataFrame.eval

so.Plot(tips, x=lambda: "tip / total_bill")

(+) Least verbose (+) Can get a nice name (+) Could support serialization with some extra internal handling (–) Abuses the purpose of lambdas and may be confusing (–) Programming in strings means linters won’t work

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:4
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
philsheardcommented, Oct 22, 2022

From a usability perspective I’d favour either 1 or 3. But 1 is verbose enough that I’m more likely to do that transform directly on the DF and then pass it to the plotting func. So I’d say 3 offers something uniquely appealing in its brevity and ‘magic’.

1reaction
jcmkk3commented, Oct 22, 2022

But when writing up this issue was having trouble articulating the case for this over option 1.

Yeah. Being able to pass in the columns as arguments to the lambda is mostly helpful if you’re able to use short variable names to write what feels like more mathematical formulas. It is especially useful if you’re reusing the same variable multiple times in the formula. It could always be something that could be a helper to create a function compatible with the 1st option, anyway.

An example of where something like that would come in handy would be the skew calculation in the arquero example below. Mind you, that is just destructing syntax in javascript and isn’t something that was created specifically for arquero.

// Reshape (fold) the data to a two column layout: city, sun.
dt.fold(aq.all(), { as: ['city', 'sun'] })
  .groupby('city')
  .rollup({
    min:  d => op.min(d.sun), // functional form of op.min('sun')
    max:  d => op.max(d.sun),
    avg:  d => op.average(d.sun),
    med:  d => op.median(d.sun),
    // functional forms permit flexible table expressions
    skew: ({sun: s}) => (op.mean(s) - op.median(s)) / op.stdev(s) || 0
  })
  .objects()
Read more comments on GitHub >

github_iconTop Results From Across the Web

RFC 525 - IETF
For example, to get a plot on sin(x), the "program" II REAL SIN x DISPLAY ... 2] RFC 525 MIT-MATHLAB MEETS UCSB-OLS 1...
Read more >
✨ RFC - Never Type initiative
Similarly, they are the logical type for expressions that never return to ... We construct a directed graph where the vertices are inference...
Read more >
RFC 1019: Report of the Workshop on Environments for ...
Alan Katz's recent essay, "Issues in Defining an Equations Representation Standard", RFC-1003, DDN Network Information Center, March 1987 (reprinted in the ...
Read more >
Engineering Math | ShareTechnote
Fortunately most of calculus concept for single variable can be represented in a form of graph or illustration. First let's look at the...
Read more >
Math Notation for R Plot Titles: expression and bquote
In this post you will learn: How to create expressions that have mixed (1) strings, (2) expressions, & (3) numbers How to pass...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found