Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Suggestions for implementing a composition-based optimization (i.e. fractional portion of ingredients)

See original GitHub issue

For starters, my experience with Ax is running the Loop tutorial once and reading through some of the documentation such as the parameter types (i.e. fairly new). Also, I have some familiarity with Bayesian optimization.

The actual use-case is slightly different and more complicated, but I think the following is a suitable toy example. I go over the problem statement, some setup code, and possible solutions. Would love to hear some feedback.

Problem Statement

Take a composite material with the following class: ingredient combinations:

Filler: Colloidal Silica (filler_A)
Filler: Milled Glass Fiber (filler_B)
Resin: Polyurethane (resin_A)
Resin: Silicone (resin_B)
Resin: Epoxy (resin_C)

Take some toy data of components and their fractional prevalences (various combinations of fillers and resins, and various numbers of components) along with their objective (training data), and some model which takes arbitrary input parameters and predicts the objective (strength) which we wish to maximize.

For constraints, I’m thinking:

limit the total number of components in any given “formula” (e.g. max of 3 components)
naturally, that the compositions sum to 1 (or that abs(1-sum(composition)) <= tol)
there has to be at least one filler and at least one resin (if feasible)

Setup Code

To make it more concrete, it might look like the following:

choices = ["filler_A", "filler_B", "resin_A", "resin_B", "resin_C", "dummy"]

data = [
        [["filler_A", "filler_B", "resin_C"], [0.4, 0.4, 0.2]],
        [["filler_A", "resin_A", "resin_B"], [0.6, 0.2, 0.2]],
        [["filler_A", "filler_B", "resin_B"], [0.5, 0.3, 0.2]],
        [["filler_A", "resin_B", "dummy"], [0.5, 0.5, 0.0]],
        [["filler_B", "resin_C", "dummy"], [0.6, 0.4, 0.0]],
        [["filler_A", "filler_B", "resin_A"], [0.2, 0.2, 0.6]],
        [["filler_B", "resin_A", "resin_B"], [0.6, 0.2, 0.2]],
        ] # made-up data

def predict(objects, composition):
    ...
    return obj

Possible Solutions

One-hot-like prevalence encoding and components/composition

One-hot-like prevalence encoding

I’ve thought about trying to do a sort of “one-hot encoding” (assuming I’m using this term correctly), such that each component gets its own composition as a variable:

filler_A	filler_B	resin_A	resin_B	resin_C
0.4	0.4	–	–	0.2
0.6	0.0	0.2	0.2	–
0.5	0.3	–	0.2	–
0.5	–	–	0.5	–
–	0.6	–	–	0.4
0.2	0.2	0.6	–	–
–	0.6	0.2	0.2	–

which I think would look like the following:

best_parameters, values, experiment, model = optimize(
    parameters=[
        {
            "name": "filler_A",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "filler_B",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "resin_A",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "resin_B",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "resin_C",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
    ],
    experiment_name="composition_test",
    objective_name="strength",
    evaluation_function=predict,
    parameter_constraints=["abs(1 - (filler_A + filler_B + resin_A + resin_B + resin_C)) <= 1e-6", "filler_A + filler_B > 0", "resin_A + resin_B + resin_C > 0"], # not sure if I can use `abs` here
    total_trials=30,
)

However, this could easily lead to compositions where all of the components have a finite prevalence and can be problematic from an experimental perspective.

components/composition

As I mentioned in the constraints, I’ve also thought about setting an upper limit to the number of components in a formula, which I think might look something like the following:

best_parameters, values, experiment, model = optimize(
    parameters=[
        {
            "name": "object1",
            "type": "choice",
            "bounds": choices,
        },
        {
            "name": "object2",
            "type": "choice",
            "bounds": choices,
        },
        {
            "name": "object3",
            "type": "choice",
            "bounds": choices,
        },
        {
            "name": "composition1",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "composition2",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "composition3",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
    ],
    experiment_name="composition_test",
    objective_name="strength",
    evaluation_function=predict,
    parameter_constraints=["abs(1 - (composition1 + composition2 + composition3)) <= 1e-6"],
    total_trials=30,
)

How would you suggest implementing this use-case in Ax? If it would help, I’d be happy to flesh this out into a full MWE or try out any suggestions. The real use-case involves ~100 different components across 4 different classes, and the idea is to (eventually) use this in an experimental adaptive design scheme.

(tag @ramz-i who is the individual in charge of this project in our research group, post here if you have anything to add)

#706

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:24 (23 by maintainers)

Top GitHub Comments

2reactions

sgbairdcommented, Dec 15, 2021

Here is a roadmap of some outstanding items as well as other features/topics that came up along the way. I’ll plan on updating these later if the status changes (along with an “EDIT” keyword). If I may, I’m also cc’ing the people who have been participating or tagged in these discussions to bring attention to the “birds-eye” issue. Thank you to everyone who has helped me out so far. Your responses have been incredibly useful and informative. Personally, it’s been very rewarding for me to get a better understanding of BO through the lens of a practical use case while using what I consider to be an excellent platform for it.

@lena-kashtelyan @Balandat @bernardbeckerman @eytan @saitcakmak @Ryan-Rhys @qingfeng10 @dme65 (@ramseyissa who is lead on the project)

`n_components < max_components` constraint

✔️ implement a one-hot-encoding scheme
- ❌/❓ use outcome_constraint as a workaround for constraining n_components #745
- ❌/❓ use MOO as a workaround for minimizing n_components https://github.com/facebook/Ax/issues/727#issuecomment-974513487 (rejecting based on the same notion as above)
✔️ implement a slot-based scheme
- ❌ use Parameter Constraints for ChoiceParameter https://github.com/facebook/Ax/issues/750#issuecomment-990723635 and https://github.com/facebook/Ax/issues/710#issuecomment-990681418
- ❔ use Parameter Constraints on an int RangeParameter with a Hamming-like kernel, but only for the component parameters (i.e. keep a Matern-like kernel for composition parameters) https://github.com/facebook/Ax/issues/750#issuecomment-990723635
- ✔️/❔ Perform data augmentation https://github.com/facebook/Ax/issues/750#issuecomment-990723635
- ✔️/❔ (EDIT: 2021-12-14) Change the Ax API as suggested by @Balandat to “take in some callable that evaluates the constraint and that we can pass to the optimizer … [with] a big red warning sign” https://github.com/facebook/Ax/issues/745#issuecomment-991413498 (also applicable to one-hot-encoding scheme above)

“use my own surrogate model”

✔️ Custom surrogate models can be incorporated into custom BoTorch modules and then passed to generation strategy, but surrogate models need to be able to sample from the posterior, not just return a scalar value for the uncertainty https://github.com/facebook/Ax/issues/748
- ❔ It seems possible to convert my surrogate model of interest (my refactor of CrabNet) into a Bayesian transformer architecture using e.g. BayesFormer https://github.com/facebook/Ax/issues/748#issuecomment-988997198; however, this could be fairly difficult to implement
- ❌/❓ (EDIT 2021-12-14) Ignore covariance and use scalar uncertainties https://github.com/CitrineInformatics/lolo/issues/254#issuecomment-992744200

Adding multiple outcome measurements for fixed parameters as separate trials

✔️ simply add them as separate trials with no SEM specified assuming a low number of observations, otherwise, convert many observations to mean and SEM #752

Incorporate input/parameter uncertainty

For example, when you mix a bunch of components together, but there is some uncertainty in the final composition of each component (e.g. instrument resolution, losses during synthesis)

❌ incorporate it directly via ax_client.attach_trial https://github.com/facebook/Ax/issues/751#issuecomment-990382162
✔️ ignore it completely when input/parameter uncertainty is negligible compared to outcome uncertainty https://github.com/facebook/Ax/issues/751#issuecomment-990387697
❔ adapt the BoTorch robust design / risk-averse tutorial https://github.com/facebook/Ax/issues/751#issuecomment-990382931
❔ fit a GP using the parameters of the distribution (i.e. mean and SEM) that describes the exact (unknown) composition https://github.com/facebook/Ax/issues/751#issuecomment-990398007
❔ propagate the input uncertainty to the outcome uncertainty using a sampler (e.g. Gibbs) https://github.com/facebook/Ax/issues/751#issuecomment-990379386 and combine outcome_sigma_propagated and outcome_sigma via e.g. simple addition or law of total variance

Multi-objective optimization (e.g. strength, hardness)

While I haven’t mentioned this one yet, we are interested in converting this to a MOO scheme. Note: this is different than what I mentioned above where I suggested using MOO to implement the n_components < max_components constraint. In this case, the MOO is for “real” outcomes (e.g. strength, hardness)

✔️/❔ following the MOO tutorial should be fairly straightforward, but I’m not sure if MOO will be incompatible with any of the above-mentioned features

2reactions

sgbairdcommented, Dec 7, 2021

I adapted what I had from a Loop into a Service API and fixed some of the theory/understanding issues on my part in the linked example: https://github.com/facebook/Ax/issues/743#issuecomment-987778240 such that I’m generating a real suggested next_experiment. The main gap in my understanding is that a single evaluation of hartmann6 in the examples is like a wet-lab synthesis/characterization iteration for us.

I’m still struggling with the n_components < max_components constraint https://github.com/facebook/Ax/issues/745.

I’m also still confused about how I would replace the GPR surrogate model with my own (which has a built-in uncertainty output).

Top Results From Across the Web

optimization of chemistry reactions · Issue #706 · facebook/Ax · GitHub

sgbaird mentioned this issue on Nov 18, 2021. Suggestions for implementing a composition-based optimization (i.e. fractional portion of ingredients) #727.

Strategies for Fermentation Medium Optimization: An In-Depth ...

Based upon the obtained experimental data, optimization technique is used to predict a mathematical model and improve the medium composition.

Using a mixture design and fraction-based formulation to better

Industrial pea-protein ingredients are traditionally generated via a several-step wet process. 70. Pea seeds are solubilized in an alkaline solution, which is ...

A comprehensive linear programming tool to optimize ...

The LP tool may be adapted to create new formulations of other foods, such as supplementary foods, local foods, or foods high in...

Optimizing Fractional Compositions To Achieve ... - ChemRxiv

Our approach shifts the optimization focus from model parameters to the fractions of each element in a composi- tion. Using a pretrained network ......

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Suggestions for implementing a composition-based optimization (i.e. fractional portion of ingredients)

Problem Statement

Setup Code

Possible Solutions

One-hot-like prevalence encoding

components/composition

Issue Analytics

Top GitHub Comments

`n_components < max_components` constraint

“use my own surrogate model”

Adding multiple outcome measurements for fixed parameters as separate trials

Incorporate input/parameter uncertainty

Multi-objective optimization (e.g. strength, hardness)

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

saasbo_nehvi.ipynb fails on trying to factorize non positive definite by PyTorch. Both on CPU and GPU

MOO with NaN outcomes

filler_A	filler_B	resin_A	resin_B	resin_C
0.4	0.4	–	–	0.2
0.6	0.0	0.2	0.2	–
0.5	0.3	–	0.2	–
0.5	–	–	0.5	–
–	0.6	–	–	0.4
0.2	0.2	0.6	–	–
–	0.6	0.2	0.2	–

filler_A	filler_B	resin_A	resin_B	resin_C
0.4	0.4	–	–	0.2
0.6	0.0	0.2	0.2	–
0.5	0.3	–	0.2	–
0.5	–	–	0.5	–
–	0.6	–	–	0.4
0.2	0.2	0.6	–	–
–	0.6	0.2	0.2	–

Suggestions for implementing a composition-based optimization (i.e. fractional portion of ingredients)

Problem Statement

Setup Code

Possible Solutions

One-hot-like prevalence encoding

components/composition

Issue Analytics

Top GitHub Comments

n_components < max_components constraint

“use my own surrogate model”

Adding multiple outcome measurements for fixed parameters as separate trials

Incorporate input/parameter uncertainty

Multi-objective optimization (e.g. strength, hardness)

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

saasbo_nehvi.ipynb fails on trying to factorize non positive definite by PyTorch. Both on CPU and GPU

MOO with NaN outcomes

`n_components < max_components` constraint

filler_A	filler_B	resin_A	resin_B	resin_C
0.4	0.4	–	–	0.2
0.6	0.0	0.2	0.2	–
0.5	0.3	–	0.2	–
0.5	–	–	0.5	–
–	0.6	–	–	0.4
0.2	0.2	0.6	–	–
–	0.6	0.2	0.2	–