Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Supplement .interactive with an Interactive Pipe class

See original GitHub issue

Background

Hvplot .interactive is truly powerful.

I believe it has the same potential as Streamlits api. It’s simple, intuitive, beautiful and very powerful. And it comes without the downsides of

Slowness of “run script from top to bottom”
Spending time on remembering and optimizing caching.
Spaghetti code

My Pain

.interactive only supports pipelines starting from a DataFrame. I would like the same power for any pipeline.

As a minimum I would like all the read_XYZ like read_sql, read_csv, read_parquet to be interactive. Most times my pipeline/ data app also holds widgets for which data to extract as it would not be feasible to extract all data for example from a large sql table.

But sometimes my data would also be extracted from a larger, costly simulation based on specific arguments. And these arguments should also be presented as widgets.

If possible I would like an api that supports beautiful, readable method chaining and feels like an integrated part of .interactive.

Reference Example

For example I would like to be able to use the same, intuitive api with built in caching for this pipeline

import time

import pandas as pd
import panel as pn


def extract(country):
    # In my case this would often be an expense SQL query from a large table
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame({
        "col1": [value]*10,
        "col2": [value+1]*10,
        "col3": [value+2]*10,
    })

def transform(series: pd.DataFrame, col):
    time.sleep(1)
    return series[col].sum()

def pipeline(country, col):
    series = extract(country)
    return transform(series, col)

country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

Existing Apis

I can’t see how any of the existing Param/ Panel apis support this. For example not the Pipeline.

This would be a solution though

import time

import pandas as pd
import panel as pn


def extract(country):
    # In my case this would often be an expense SQL query from a large table
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame(
        {
            "col1": [value] * 10,
            "col2": [value + 1] * 10,
            "col3": [value + 2] * 10,
        }
    )


def transform(series: pd.DataFrame, col):
    time.sleep(1)
    return series[col].sum()


def pipeline(country, col):
    series = extract(country)
    return transform(series, col)


country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

ipipeline = pn.bind(pipeline, country=country_widget, col=col_widget)

pn.Column(country_widget, col_widget, pn.panel(ipipeline, loading_indicator=True)).servable()

https://user-images.githubusercontent.com/42288570/139796524-8da494fb-cb6e-4bd8-a009-fa2f2d5a06cc.mp4

But it does not

allow method chaining
feel as simple and intuitive as .interactive
provide built in caching for each step.

It feels like another api than .interactive and thus does not keep things simple.

Proposed API

Something like

ipipeline = (
    Interactive(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

pn.pane(ipipeline).servable()

I would also like the Interactive class to recognize Dataframe and just make them .interactive so that this would also be possible

ipipeline = Interactive(extract, country=country_widget)[col_widget].sum()
    
pn.pane(ipipeline).servable()

Additional Context

As the built in caching should now support arbitrary objects they could either be memory cached, cached using diskcache or cached using some method which the user can provide.
Make an example of how to start an .interactive data extraction that would be enough to solve this problem pain?
Ahh. What about pipelines that start from multiple extractions of dataframes and assembles them. That would also be nice to be able to make them .interactive with caching.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

maximltcommented, Oct 28, 2022

Looks like this has been implemented in https://github.com/holoviz/hvplot/pull/720 😃

1reaction

MarcSkovMadsencommented, Nov 4, 2021

So based on the previous example we can define

def interactive(func, *args, **kwargs):
    def wrapper(_, *args, **kwargs):
        return func(*args, **kwargs)
    return (
        pd.DataFrame().interactive()
        .pipe(wrapper, country=country_widget)
    )

and get an api like

ipipeline = (
    interactive(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

So the main problem to solve is the missing caching.

Full Example

import time

import pandas as pd
import panel as pn
import hvplot.pandas


def extract(country):
    # In my case this would often be an expense SQL query from a large table
    print(country)
    time.sleep(1)
    value = {"DK": 0, "DE": 3, "US": 6}[country]
    return pd.DataFrame(
        {
            "col1": [value] * 10,
            "col2": [value + 1] * 10,
            "col3": [value + 2] * 10,
        }
    )


def transform(series: pd.DataFrame, col):
    time.sleep(1)
    print(col)
    return series[col].sum()



country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])

def interactive(func, *args, **kwargs):
    def wrapper(_, *args, **kwargs):
        return func(*args, **kwargs)
    return (
        pd.DataFrame().interactive()
        .pipe(wrapper, *args, **kwargs)
    )

ipipeline = (
    interactive(extract, country=country_widget)
    .pipe(transform, col=col_widget)
)

pn.Column(
    ipipeline.widgets(),
    ipipeline.panel(loading_indicator=True),
).servable()