Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Smart Config

See original GitHub issue

Missing functionality

Configuration is always a big problem for me.

When I was beginner of PP, I didn’t know how to set various parameters for my data set and my ML case, the used time and memory was sometimes unacceptable.

Until recently, I read all of the code, familiar with all the implementations and configurations, so I can choose the efficient configuration method. But you can’t expect every user to use this approach to learn to configure their own case.

Some friends of mine always complains about how slow PP is, but I find that PP itself is actually not that slow, it’s the constant configuration makes PP slow.

What’s worse, some of the default config items become problems when I am trying to tune the performance. For example, here are some test result about running time when I am doing performance tests on a dual-core-processor server(The benchmark is to generate HTML reports on some commonly used data sets):

Branch	Use_dask	bayesian_blocks	Repeat	Benchmark1(ms)	Benchmark2(ms)	Benchmark3(ms)	Benchmark4(ms)	Benchmark5(ms)
loopy-patch-fast	True	False	10	5361	10151	12342	6013	1804
loopy-patch-fast	False	False	10	16089	12734	16799	9680	1802
loopy-patch-fast	True	True	10	35903	78999	92227	6945	1906
master	False	False	10	17098	12697	16287	11783	1742
master(Default)	False	True	10	39990	73714	86397	13032	1863

As the table above shows, bayesian_blocks(Default to be True) takes more than 60% time and produces an almost same histogram on large data sets. What’s worse, this problem will become more serious as the data set increases. On some particular data sets, the ratio even rises to more than 90%.

Different data sets should handle differently to be both fast and effective. Otherwise, user experience and ease of use will be greatly affected, especially for beginners, even complete and detailed documentation is not enough in this case.

In fact, when running on some large data sets, tweaking the config parameters, using parallel scheduling can save about 75%~95% of the time and produce an almost same report.

So I propose these two features below.

Proposed feature

Interactive config widget: Since PP is always used in notebook, why not create an interactive config widget to help user (especially the beginners) make their own config? It will improve the user experience a lot and make it more convenience.
Auto config: When no configuration is specified, we should generate a config according to the input data.

Both of the feature are not difficult to implement, but ‘Auto config’ requires some experienced to run some tests and carefully choose the strategies and thresholds.

Additional context

Recently, I am focusing on pipelining the project, performing performance tuning and fixing related bugs. As a result, recently I may not able to implement these two features. That’s also why I did not send PR, but left an issue here.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

loopymecommented, Jul 22, 2020

Hi, @neomatrix369!

The ‘Config Recommendation System’ your proposed is very promising and I’ve had similar thoughts before, which I called ‘auto-config’. The key problem, just like you mentioned - how do we find out what to recommend?, I’ve also thought about it for a long time, could not find a proper solution till now. You have proposed a nice one, but I think it may be still not a very proper solution.

As far as I know, PP is essentially a cool tool of generating report and most of the configuration items are used to describe user needs, not run-time parameters. So from the user’s perspective, config is always fixed with a given demand. For example, if I need correlations between variables and want to use A as reject threshold, no matter how much time or memory computation will take, I will still need them and config should not change.

As a result, recommended_configs strategy may only applicable to some runtime-related configuration items like pool_size. If we add more run-time control parameters later, maybe it will become a nice move.

(BTW, I think the root problem with the ‘bayesian_blocks’ I mentioned earlier in this issue is that the third-party package that implement this feature are not scale on big data set.)

I found out two ways, which may improve the user experience about config:

Config widget, like I mentioned in this issue, which I think will guide user to accurately express their needs. (See #477)
Task scheduling system, can be used to avoid redundant calculations with a given config and support more fine-grained configuration options. (See Task-Graph It may not be applied to PP, but similar ideas can be used for subsequent adjustments)

The pr and task-graph thing is WIP and currently on hold, I am sorry that these work are currently stalled partly because of some mechanism selection, I am occupied with some other work related to computational graphs in this period of time. Once I have some time, I will continue the previous work.

0reactions

neomatrix369commented, Jul 22, 2020

how do we find out what to recommend?, I’ve also thought about it for a long time, could not find a proper solution till now. You have proposed a nice one, but I think it may be still not a very proper solution. Interesting you have meandered around a similar path. This is a very raw idea and needs some PoC and exploratory work before we can nail it. Hence a feature-flagged approach will help.

Initially, we will have to collect data and also the right data - both of which could come through iterations. And this does not have to be from others but initially, it will be from our own setups (machines, environments, etc…). When things mature, we can also get samples from others to help fine-tune the internal model.

I have just put together some ideas after reading your post so it needs further thinking and experimentation but I have a feeling the template of the path is more or less fine to walk on.

After reading your task graph resource I’m more of the idea that its smart optimisation(s) on the pipeline-end we might need to make, as opposed to suggesting a single or list of suitable configurations.

I’m still thinking that the system (whatever we call it recommender or autoconfig) can make suggestions/predictions about:

the time it would take to generate these reports (with an acceptable level of error) per configuration (based on its past experience)
accuracy of the reports produced (again per config)
it’s own accuracy (about each assessment it makes)

Top Results From Across the Web

Basic Smart Config — Sming documentation

SmartConfig is a mechanism to more easily configure an ESP device using a smart phone. Calling smartConfigStart() starts a search for an Access...

ESP32 Using SmartConfig - ESP-Touch App - TechTOnions.com

ESP32 SmartConfig, The easiest way to configure Wi-Fi credentials for your ESP32 based IoT project or device. And explore the ESP-Touch app.

How does SmartConfig technically work? - Wi-Fi forum - TI E2E

The phone app broadcasts the Smart Config sequence using a TI proprietary protocol. The sequence is broadcasted on the Wi-Fi channel which the ......

Demo 11: How to use SmartConfig on Arduino ESP32

Now this technique was also applied for ESP32. In order to do SmartConfig, you need a smartphone or tablet (Android or iOS) that...

EspTouch: SmartConfig for ESP8 - Apps on Google Play

The application help ESP8266 and ESP32 auto-config wifi network. Updated on. Oct 22, 2019. Education. Data safety. Developers can show information here ...