question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: dynamically computed columns in tables

See original GitHub issue

One feature I’d like to see in tables is the ability to create ‘derived’ columns that are evaluated on-the-fly using other columns. Take the following example:

from astropy import units as u
t = QTable()
t['spectral type'] = ['O5', 'B5', 'A5', 'F5', 'G5']
t['radius'] = [12, 3.9, 1.7, 1.3, 0.92] * u.R_sun
t['temperature'] = [45000, 15000, 8200, 6400, 5700] * u.K
t['luminosity'] = 4 * pi * t['radius'] ** 2 * sigma_sb * t['temperature'] ** 4

If I now add a row with:

t.add_row({'spectral type': 'K5',
           'temperature': 4300 * u.K,
           'radius': 0.72 * u.R_sun})

Then luminosity won’t be computed for the new row. More generally, for large tables, it would be nice to have lazily evaluated new columns to avoid taking up too much memory.

Since we of course don’t want to break backward-compatibility, one way we could do this is by having a way to indicate that a column should be accessed as a reference to be used in an arithmetic operation rather than as values. This could be done for example using:

t['luminosity'] = 4 * pi * t.ref['radius'] ** 2 * sigma_sb * t.ref['temperature'] ** 4

There might be other ways to do this of course, and this is just an idea.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
astrofrogcommented, Oct 22, 2018

You don’t need to parse the syntax, since Python is doing it for you - we’d just need to define operators on columns such that e.g. Column <op> constant or Column <op> Column could return a CompositeColumn which takes two terms (e.g. column and scalar or column and column) and an operator. The above expression can be evaluated as a nested set of such CompositeColumn. Maybe @maartenbreddels has some ideas here too since vaex supports lazy evaluations.

0reactions
maartenbreddelscommented, Oct 23, 2018

Some comments on vaex:

  • virtual columns are not stored (yet) in files, if you write out a file, it writes out the virtual column as real data. If you to store the virtual expressions in files, you probably want/need to define a (language neutral math language) that is easy to parse, like polish notation. I also discussed this with @SylvainCorlay as some in between language that Python and C++ could generate for a different module to optimize (think Numba, Pythran, llvm, numexpr). I guess there are a few dozen standards out there for such a mathematical languages (basically numpy language neutral?).
  • Variables/constants/scalars are understood by vaex as well, for pi indeed.
  • ds['x'] = ds.y**2; ds['x'] = ds.x + 1 works in vaex, by rewriting all the virtual column expression. The old column x is renamed to __x where __ denotes a ‘hidden’ column, and all other expressions get modified to refer to this new hidden column that by default does not get written out to disk.

Having the whole expression system allowed me to have derivatives as well, and even better, propagation of errors/uncertainties using the whole covariance matrix (super useful for Gaia data for instance), see https://docs.vaex.io/en/latest/tutorial.html#Propagation-of-uncertainties for an example.

At this step it gets tricky because you then also need to simplify the expressions, which vaex does, not to end up with insanely large expressions. At this step you could consider relying on sympy.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Discover SQL: Dynamically recomputing columns - CodeProject
Our problem is to perform a different computation for each row of a certain table, with each computation involving several columns. Computed ......
Read more >
FAQ: Are Computed/Virtual columns supported in an ...
A computed/virtual column is populated dynamically and is not physically stored with the table. Computed/virtual columns are read-only.
Read more >
Dynamic Columns based on values - Phil Seamark on DAX
I came across an interesting request last week from someone who wanted to dynamically control columns shown in a Power BI table visual ......
Read more >
ALTER TABLE computed_column_definition (Transact-SQL)
ALTER TABLE computed_column_definition specifies the properties of a computed column that is added to a table by using ALTER TABLE.
Read more >
Computed Columns | CockroachDB Docs
JSONB columns are used for storing semi-structured JSONB data. · Secondary indexes can be created on computed columns, which is especially useful when...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found