Supplement .interactive with an Interactive Pipe class
See original GitHub issueBackground
Hvplot .interactive
is truly powerful.
I believe it has the same potential as Streamlits api. It’s simple, intuitive, beautiful and very powerful. And it comes without the downsides of
- Slowness of “run script from top to bottom”
- Spending time on remembering and optimizing caching.
- Spaghetti code
My Pain
.interactive
only supports pipelines starting from a DataFrame. I would like the same power for any pipeline.
As a minimum I would like all the read_XYZ
like read_sql
, read_csv
, read_parquet
to be interactive. Most times my pipeline/ data app also holds widgets for which data to extract as it would not be feasible to extract all data for example from a large sql table.
But sometimes my data would also be extracted from a larger, costly simulation based on specific arguments. And these arguments should also be presented as widgets.
If possible I would like an api that supports beautiful, readable method chaining and feels like an integrated part of .interactive
.
Reference Example
For example I would like to be able to use the same, intuitive api with built in caching for this pipeline
import time
import pandas as pd
import panel as pn
def extract(country):
# In my case this would often be an expense SQL query from a large table
time.sleep(1)
value = {"DK": 0, "DE": 3, "US": 6}[country]
return pd.DataFrame({
"col1": [value]*10,
"col2": [value+1]*10,
"col3": [value+2]*10,
})
def transform(series: pd.DataFrame, col):
time.sleep(1)
return series[col].sum()
def pipeline(country, col):
series = extract(country)
return transform(series, col)
country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])
Existing Apis
I can’t see how any of the existing Param/ Panel apis support this. For example not the Pipeline.
This would be a solution though
import time
import pandas as pd
import panel as pn
def extract(country):
# In my case this would often be an expense SQL query from a large table
time.sleep(1)
value = {"DK": 0, "DE": 3, "US": 6}[country]
return pd.DataFrame(
{
"col1": [value] * 10,
"col2": [value + 1] * 10,
"col3": [value + 2] * 10,
}
)
def transform(series: pd.DataFrame, col):
time.sleep(1)
return series[col].sum()
def pipeline(country, col):
series = extract(country)
return transform(series, col)
country_widget = pn.widgets.Select(value="DK", options=["DK", "DE", "US"])
col_widget = pn.widgets.Select(value="col1", options=["col1", "col2", "col3"])
ipipeline = pn.bind(pipeline, country=country_widget, col=col_widget)
pn.Column(country_widget, col_widget, pn.panel(ipipeline, loading_indicator=True)).servable()
But it does not
- allow method chaining
- feel as simple and intuitive as
.interactive
- provide built in caching for each step.
It feels like another api than .interactive
and thus does not keep things simple.
Proposed API
Something like
ipipeline = (
Interactive(extract, country=country_widget)
.pipe(transform, col=col_widget)
)
pn.pane(ipipeline).servable()
I would also like the Interactive
class to recognize Dataframe and just make them .interactive
so that this would also be possible
ipipeline = Interactive(extract, country=country_widget)[col_widget].sum()
pn.pane(ipipeline).servable()
Additional Context
- As the built in caching should now support arbitrary objects they could either be
memory
cached, cached usingdiskcache
or cached using some method which the user can provide. - Make an example of how to start an
.interactive
data extraction that would be enough to solve this problem pain? - Ahh. What about pipelines that start from multiple
extractions
of dataframes and assembles them. That would also be nice to be able to make them.interactive
with caching.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top GitHub Comments
Looks like this has been implemented in https://github.com/holoviz/hvplot/pull/720 😃
So based on the previous example we can define
and get an api like
So the main problem to solve is the missing caching.
Full Example