Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: Nesting Refactor for Encoder/Decoder Config

See original GitHub issue

In the push to create a fully fledged schema using Marshmallow, we’ve run into a bit of an issue. When specifying things like optimizers, the structure of the config works well in a nested structure like this:

trainer:
     learning_rate: 0.001
     optimizer:
          type: adam
          beta_1: 0.9
          beta_2: 0.999

This works great for the structure of the schema since it allows us to create a hierarchical set of modules and submodules that can help build out the schema object which can be used internally for consistency in Ludwig, for populating parameter info on the frontend, and also for building the config object in the SDK. The issue here lies in the config structure for Encoders and Decoders. Currently the way we specify encoder details in the config is without using any nesting and just adding any additional encoder specific parameters at the same level. For example:

input_features:
    - name: Feature_1
       type: binary
       encoder: dense
       dropout: 0.2
       activation: leakyRelu
    - name: Feature_2
       type: text
       encoder: stacked_cnn
       dropout: 0.2
       reduce_output: mean
       embeddings_on_cpu: false
       strides: 5
       output_size: 128

The issue here is that the encoder parameters are nested at the input feature level which is an issue for a few reasons:

The code required to validate a schema without a nested module structure becomes very messy and convoluted which will only become more of a problem over time with the addition of new encoders and our value prop of extensibility.
Specifying encoder parameters in the future SDK config object will be complicated and unintuitive because you will have to add the specific encoder parameters to the input feature as opposed the the specific encoder you want to use.

Because of this, we are proposing a refactor of the encoder config structure that nests encoders an additional level to stay consistent with a module structure. For example:

Instead of this

input_features:
    - name: Feature_1
      type: binary
      encoder: dense
      dropout: 0.2
      activation: leakyRelu

We would do this

input_features:
    - name: Feature_1
      type: binary
      encoder: 
            type: dense
            dropout: 0.2
            activation: leakyRelu

Note this also applies for decoders as well. If we decide to move forward on this proposal, we would need to put in some logic to handle backwards compatibility for all the existing Ludwig users. While this may seem like a headache, I believe that this is something that we definitely want to do, and I think that doing it now as opposed to later is a good idea since the impact issues from a change like this will only increase over time. Let me know your thoughts!

Issue Analytics

State:
Created a year ago
Comments:7

Top GitHub Comments

1reaction

connor-mccormcommented, Jun 14, 2022

I plan on scoping out the difficulty of a refactor after I have worked through option 2 to fill out the schema in the mean time.

0reactions

tgaddaircommented, Jun 12, 2022

@brightsparc we discussed the challenge of implementing the nested structure in terms of the refactoring effort. The consensus is that we should explore it and see where the complexity exists, if it does, and how much effort would be required to resolve it.

Top Results From Across the Web

Advanced Encoding and Decoding Techniques in Go

The first technique we are going to examine is to create a new type and convert our data to/from that type before encoding...

Decoder Configuration Guide for RSA NetWitness Platform 11.3

A basic RSA NetWitness® Platform network includes at minimum Brokers, Concentrators, and. Decoders. Brokers aggregate data from Concentrators, ...

qs/CHANGELOG.md at main · ljharb/qs - GitHub

A querystring parser with nesting support. Contribute to ljharb/qs development by creating an account on GitHub.

[PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers ...

Since even with distro configs the kallsyms table is only a few dozen MB big, ... Please note that the tree is RFC,...

https://web2.qatar.cmu.edu/~liginlal/angular/my-ap...

[Dev Deps] update `eslint`, `@ljharb/eslint-config`, `browserify`, ... for custom param encoding/decoding (#160) - [Fix] fix compacting of nested sparse ...