Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ux: problems with ibis' `_` convenience API for deferred attribute resolution

See original GitHub issue

#3804 introduced a nice way to be able chain several operations that use new columns not originally part of in input expression:

import pandas as pd
import ibis
from ibis import _

table = ibis.memtable(pd.DataFrame({"x": list("12213")}))
(
   table.mutate(a=2 * _.x, b=_.x.cast("float64"))
   .group_by(_.a)
   .aggregate(count=_.b.count(), mean_b=_.b.mean())
   .order_by(_.mean_b)
).execute()

This makes it very easy to complex logic in functions that add or rename columns and the use the .pipe method to chain them in particular.

The choice of _ as the name of the importable deferred attribute resolver is nice because it leads to quite concise and readable code. However I also find it problematic from an UX point of view for the following reasons:

a) it’s not easily googleable: someone not familiar with this idiom will have a hard time googling or stackoverflowing for it.
b) it can conflict with the _ variable of Jupyter notebooks (that store the value of the last executed cell) and can therefore lead to very confusing error messages for the unsuspecting users that do exploratory data analysis with Ibis in a Jupyter notebook with multiple cells.
c) even in regular, non-interactive Python code it can conflict with the common idiom of assigning ignored function call results to a dummy _ variable to express that we do not need a variable for an ancillary value. E.g.:

a, _ = function_returning_a_pair_of_values()  # ignore the second value

for _ in range(n_trials):
    if attempt():
        break
else:
    raise Exception(f"{n_trials} consecutive failed attempts")

Possible solutions

Solution for a) only could be to change the default name of the singleton while allowing to keep on using the _ idiom later in the code:

from ibis import deferred_resolver as _

# continue as previously

At least people reading this code for the first time will understand that _ is some kind of deferred attribute resolver just by reading the code: no complex googling or random documentation scanning.

I don’t have good solutions for b) and c). Here are ideas:

Suggest (by convention) to name the resolver `c` instead of `_`

I picked up c for “column” because most of the time the attribute lookup matches a column look-up. However a variable named c might still frequently conflict with user code. Adapting the original example would be lead to code that looks like

import pandas as pd
import ibis
from ibis import deferred_resolver as c


table = ibis.memtable(pd.DataFrame({"x": list("12213")}))
(
   table.mutate(a=2 * c.x, b=c.x.cast("float64"))
   .group_by(c.a)
   .aggregate(count=c.b.count(), mean_b=c.b.mean())
   .order_by(c.mean_b)
).execute()

or we could be even more creative with the 𓅠 unicode symbol for the ibis (bird) hieroglyph:

import pandas as pd
import ibis
from ibis import deferred_resolver as 𓅠


table = ibis.memtable(pd.DataFrame({"x": list("12213")}))
(
   table.mutate(a=2 * 𓅠.x, b=𓅠.x.cast("float64"))
   .group_by(𓅠.a)
   .aggregate(count=𓅠.b.count(), mean_b=𓅠.b.mean())
   .order_by(𓅠.mean_b)
).execute()

The latter suggestion is more like a joke because:

it can break code editors / readers that do not render non-ascii unicode symbols properly;
it’s cumbersome to type such code without setting some kind of OS-level custom keyboard mapping / user defined code snippets.

Issue Analytics

State:
Created a year ago
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

saulpwcommented, Oct 25, 2022

Hey @ogrisel, these are good points, thanks for bringing them up. I agree that _ is problematic for the reasons you mentioned. (Thanks also @jmckk for chiming in.)

To add some other possible approaches, let me say that we’ve been considering similar functionality for join expressions, which need to refer to both the left and the right tables being joined. The obvious choice here are something like L and R (or _L or L_); this might suggest an equivalent C (or X) as a replacement for _. AFAIK these single capital letter symbols wouldn’t conflict with any mainstream Python idioms or platforms.

This still doesn’t make them searchable, but we want to keep the identifier extremely short anyway; one or two characters at most, so maybe this is not possible. Although we should come up with a memorable and searchable name that is part of its docstring and mentioned everywhere we mention this feature.

A Unicode character is clever but I agree, it’s a non-starter if it can’t be typed on a standard keyboard in every locale 😃

As to your side UX problem, in my opinion, the repr for an Ibis expression in general should generate an identical (or equivalent) Ibis expression string, like a quine. This would open up some really interesting use-cases, but is also a large amount of work. We might be able to do something more helpful in this smaller case, which may be easier and also push us in this more general direction.

0reactions

ogriselcommented, Nov 28, 2022

You can see how it starts with providing completions for all of the different methods on columns, then after I use a method that is specific to a certain type, it only completes those since it knows a more specific type.

Interesting. I suppose that the type hints will make this work even when no interactive Python shell backs the code editor (e.g. editing a .py file in VS Code, and not just for .ipynb files).

However, that won’t solve the problem of suggesting the right column names in when typing expressions in a chained expressions with previously added column names (e.g. via mutate or agg).

Top Results From Across the Web

424B4

In deploying and using our platform, our customers depend on our 24/7 support team to resolve complex technical and operational issues, including ensuring ......

HSIM® Simulation Reference

Another issue is the resolution of the interconnect segmentation. When a fine interconnect segmentation resolution is requested, a flat extractor can create ...

Proceedings of the hypertext standardization workshop January ...

Craig Thompson raised the issue of establishing a more formal hypertext/hypermedia. "study group" with regular scheduled meetingsand operating procedures.

Upgrading SAP.pdf

The Delivery Type attribute is stored in table field DD02L.CONTFLAG. 4.2.3 Table Data Classes. The “data class” determines the link between the SAP...

2 2 — 5 3 RESOLUTION 1 BUD - Honolulu Legislative Documents

The Series 2022 Bonds which are Deferred Income Bonds, if any, ... table of contents appended hereto or to copies hereof, shall be...