question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow parsing multiple nested sub commands

See original GitHub issue

What I am trying to do:

I am training an encoder-decoder neural network and I have multiple different encoders and decoders. Each encoder and decoder has a separate dataclass config.

I want to be able to specify from the commandline which encoder to use by specifying its config name. Once I specify that, I also want to be able to adjust the selected config fields. So far I was able to accomplish this subparsers:

I have something like

@dataclass
class Model:
    encoder: Union[RNNEncoder, ConvEncoder] = RNNEncoder() 
    decoder: RNNDecoder = RNNDecoder()

And I simply parse it just like in the subparsers example. I then specify the encoder keyword and then it allows me to specify args specific to the selected architecture.

The problem is, if I want to do the same also for decoder, subparsers are not available anymore and I get

error: cannot have multiple subparser arguments

Ideal state

How can I solve my problem with simple_parsing? My ideal syntax would probably be to use the Union type in the dataclasses and call it like so:

python script.py --some_unrelated_args \
    encoder rnnencoder --args_related_to_rnnencoder \
    decoder convdecoder --args_related_to_convdecoder

Solutions & workarounds

There are quite many suggestions in this SO thread. The issue can be solved with multiple subparsers and user-side argv parsing. The following code allows me to call my script like so:

python script.py \
    --glob1.xx 7 \
    --glob2.xx 5 \
    encoder convencoder \
      --y 2 \
    decoder convdecoder \
      --n 7

And get

Namespace(decoder=Decoder(decoder=ConvDecoder(n=7)), encoder=Encoder(encoder=ConvEncoder(y=2)), glob1=Global1(xx=7, yy='hello'), glob2=Global2(xx=5, yy='hello'))

The code:

import sys
import itertools
from functools import partial

from dataclasses import dataclass

from typing import Union
from simple_parsing import ArgumentParser
from argparse import Namespace


@dataclass
class RNNEncoder:
    x: int = 1


@dataclass
class ConvEncoder:
    y: int = 2


@dataclass
class RNNDecoder:
    m: int = 3


@dataclass
class ConvDecoder:
    n: int = 4


@dataclass
class Encoder:
    encoder: Union[RNNEncoder, ConvEncoder] = RNNEncoder()


@dataclass
class Decoder:
    decoder: Union[RNNDecoder, ConvDecoder] = RNNDecoder()


@dataclass
class Global1:
    xx: int = 5
    yy: str = "hello"


@dataclass
class Global2:
    xx: int = 5
    yy: str = "hello"


parser = ArgumentParser()

parser.add_arguments(Global1, dest="glob1")
parser.add_arguments(Global2, dest="glob2")

sub = parser.add_subparsers()
encoder = sub.add_parser("encoder")
encoder.add_arguments(Encoder, dest="encoder")
decoder = sub.add_parser("decoder")
decoder.add_arguments(Decoder, dest="decoder")


def groupargs(arg, commands, currentarg=[None]):
    if(arg in commands.keys()):currentarg[0]=arg
    return currentarg[0]


rest = 'tmp.py encoder convencoder --y 7 decoder convdecoder --n 6'.split() # or sys.argv

argv = rest # sys.argv
commandlines = [(cmd, list(args)) for cmd,args in itertools.groupby(argv, partial(groupargs, commands=sub.choices))]
commandlines[0][1].pop(0)

namespaces = dict()
for cmd, cmdline in commandlines:
    n, r = parser.parse_known_args(cmdline)
    assert len(r) == 0, f"Provided unknown args {r} for command {cmd}"
    if cmd is None:
        namespaces["global"] = n
    else:
        namespaces[cmd] = getattr(n, cmd)

args = Namespace(
    **vars(namespaces.pop("global")),
    **namespaces,
)
print(args)

The result looks correct and leverages simple_parsing argument name resulotions etc, so its quite convenient. Some caveats and ugliness:

  1. Each Union-like option must be in a separate subparser and needs a separate dataclass
  2. Attributes from the Union-like classes are accessed like args.encoder.encoder, but it would be nicer to have args.encoder and get the final class straight away

What would be better:

  1. Allow multiple Union fields in simple-parsing and handle the subparsers internally like described above
  2. Directly fill in the user-selected class to the main class to allow args.glob1.encoder calls

The configs could look like this:

@dataclass
class Global:
    letter: Union[A, B] = A()
    number: Union[One, Two] = One()
    some_other_arg: int = 5

parser = ArgumentParser()
parser.add_arguments(Global, dest="glob")
args = parser.parse_args()

and we would get the same output like in the example.

Other stuff I tried:

I also tried to use Enum to store the configs for encoder types and it allows me to select correct config, but I cannot adjust the selcet config params. I tried to use choice but it did not accept dataclass as an argument.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
idobycommented, May 3, 2022

Well, it’s a hack, but here goes.

I was testing out different optimizers and hyperparameters, and didn’t like that my experiments had this configuration code in them, so I made this (details omitted for clarity):

def optimizer(model_params, train_config: TrainConfig) -> torch.optim.Optimizer:
    if train_config.optimizer_type.lower() == 'adam':
        adam_config = poly_config(AdamConfig, 'adam_')
        return torch.optim.Adam(model_params, lr=train_config.lr, weight_decay=train_config.weight_decay,
                                betas=adam_config.betas)
    elif ...:

train_config is a dataclass that includes the optimizer_type field that is read by SP from the command line, then poly_config injects the appropriate class and reparses the command line at runtime. There’s nothing stopping me from making this more robust with automatic mappings etc. But this allows me to have a hierarchy of *Config types for each optimizer, where client code only ever sees the appropriate concrete type at runtime (e.g. AdamConfig, SGDConfig, etc). The line where I use this class to create a torch adam object is just to show the point of this.

The resulting command line is: python myscript.py --lr=0.0001 --optimizer_type=adam --adam_betas=0.9 0.9

If you don’t supply adam_betas you will either get a default value or an error if there’s no default value. Once I automate this, I will also be able to throw an error if any arguments are leftover when the whole parsing procedure is done (i.e the user specified --adam_betas in an invalid context like not having specified Adam first).

While it is a hack, doing hacks on top of argparse seems to be SP’s job. However, since SP subclasses AP and provides the same interface, I’m not sure multiple parsing passes are possible to implement in SP without breaking AP’s contract. I’ll need to think about this some more.

On second thought, it seems better for SP to stick to simple parsing and let my library do all the hacks on top of that.

1reaction
lebricecommented, May 3, 2022

Greeting @janvainer , thanks for posting!

This is the eternal thorn in my foot, and it’s quite possible IMO that this issue of Argparse was one of the biggest motivations for other approaches like Hydra’s.

I’ve worked on this kind of thing a LOT. There is no neat way of doing this in SP, and I don’t quite see a clean way forward.

I implemented something like this in my project in a few lines but it would be nice if SP had built-in support.

Talk is cheap @idoby . Show me the code! 😛

As for your question @janvainer , the best solution I’ve found so far is to add every combination of nested subparsers. It sounds ugly, but it’s actually pretty neat. The only thing you have to watch out for is the order in which the arguments are passed. It’s also a bit buggy with lists and other types of fields that generate a value of nargs “*”.

I called this feature add_independant_subparsers. There’s a proof-of-concept for this in the lebrice/multiple_subparsers branch, which I’ll probably make a PR for soon.

Here is an example:

from typing import ClassVar, Tuple
from simple_parsing.helpers.independent_subparsers import add_independent_subparsers

from dataclasses import dataclass
from simple_parsing import ArgumentParser
from pathlib import Path
import os


@dataclass
class DatasetConfig:
    """ Configuration options for the dataset. """

    image_size: ClassVar[Tuple[int, int, int]]

    # Number of samples to keep. (-1 to keep all samples).
    n_samples: int = -1
    # Wether to shuffle the dataset.
    shuffle: bool = True


@dataclass
class MnistConfig(DatasetConfig):
    """ Configuration options for the MNIST dataset. """

    image_size: ClassVar[Tuple[int, int, int]] = (28, 28, 1)
    n_samples: int = 10_000  # some random number just for sake of illustration.
    shuffle: bool = True
    foo: str = "foo_hey"


@dataclass
class ImageNetConfig(DatasetConfig):
    """ Configuration options for the ImageNet dataset. """

    image_size: ClassVar[Tuple[int, int, int]] = (28, 28, 1)
    n_samples: int = 10_000_000  # some random number just for sake of illustration.
    shuffle: bool = False
    # Path to the imagenet directory.
    path: Path = os.environ.get("IMAGENET_DIR", "data/imagenet")


@dataclass
class ModelConfig:
    """ Configuration options for the Model. """

    # Learning rate.
    lr: float = 3e-4


@dataclass
class SimpleCNNConfig(ModelConfig):
    """ Configuration options for a simple CNN model. """

    lr: float = 1e-3


@dataclass
class ResNetConfig(ModelConfig):
    """ Configuration options for the ResNet model. """

    lr: float = 1e-6


def main():
    parser = ArgumentParser(description=__doc__)
    add_independent_subparsers(
        parser,
        dataset={"mnist": MnistConfig, "imagenet": ImageNetConfig},
        model={"simple_cnn": SimpleCNNConfig, "resnet": ResNetConfig},
    )

    args = parser.parse_args()
    dataset_config: DatasetConfig = args.dataset
    model_config: ModelConfig = args.model

    print(f"Args: {args}")
    print(f"Dataset config: {dataset_config}")
    print(f"Model config: {model_config}")
$ python examples/multiple_subparsers/multiple_subparsers_example.py --help
usage: multiple_subparsers_example.py [-h] <dataset>|<model> ...

optional arguments:
  -h, --help         show this help message and exit

dataset or model:
  <dataset>|<model>
    mnist            Configuration options for the MNIST dataset. (help)
    imagenet         Configuration options for the ImageNet dataset. (help)
    simple_cnn       Configuration options for a simple CNN model. (help)
    resnet           Configuration options for the ResNet model. (help)
$ python examples/multiple_subparsers/multiple_subparsers_example.py mnist --help
usage: multiple_subparsers_example.py mnist [-h] [--n_samples int] [--shuffle bool] [--foo str] <model> ...

 Configuration options for the MNIST dataset. (desc.)

optional arguments:
  -h, --help       show this help message and exit

model:
  <model>
    simple_cnn     Configuration options for a simple CNN model. (help)
    resnet         Configuration options for the ResNet model. (help)

MnistConfig ['command_0']:
   Configuration options for the MNIST dataset. 

  --n_samples int  some random number just for sake of illustration. (default: 10000)
  --shuffle bool   (default: True)
  --foo str        (default: foo_hey)
$ python examples/multiple_subparsers/multiple_subparsers_example.py mnist resnet
Args: Namespace(dataset=MnistConfig(n_samples=10000, shuffle=True, foo='foo_hey'), model=ResNetConfig(lr=1e-06))
Dataset config: MnistConfig(n_samples=10000, shuffle=True, foo='foo_hey')
Model config: ResNetConfig(lr=1e-06)

Things to watch out for:

  • Ordering of arguments:
$ python examples/multiple_subparsers/multiple_subparsers_example.py mnist resnet --shuffle
usage: multiple_subparsers_example.py [-h] <dataset>|<model> ...
multiple_subparsers_example.py: error: unrecognized arguments: --shuffle
  • Optional arguments (e.g. bools)
$ python examples/multiple_subparsers/multiple_subparsers_example.py mnist --shuffle resnet
usage: multiple_subparsers_example.py mnist [-h] [--n_samples int] [--shuffle bool] [--foo str] <model> ...
multiple_subparsers_example.py mnist: error: argument --shuffle: Boolean value expected for argument, received 'resnet'

Here’s the correct way to pass the shuffle argument for the dataset, for example:

$ python examples/multiple_subparsers/multiple_subparsers_example.py mnist --shuffle=True resnet
Args: Namespace(dataset=MnistConfig(n_samples=10000, shuffle=True, foo='foo_hey'), model=ResNetConfig(lr=1e-06))
Dataset config: MnistConfig(n_samples=10000, shuffle=True, foo='foo_hey')
Model config: ResNetConfig(lr=1e-06)

It’s not a perfect solution, and there are probably bugs lurking in there, but it should hopefully be useful to you or to others, even in it’s current form.

Hope this helps. If you want to use it, I suggest copying the contents of this function here, or waiting until I’ve merged this to master with a PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to parse multiple nested sub-commands using python ...
Here is an example of how this fails. Add the following 3 lines: parser_b = subparsers.add_parser('command_b', help='command_b help') ; parser_b.add_argument('- ...
Read more >
How to parse multiple nested sub-commands using python ...
PYTHON : How to parse multiple nested sub - commands using python argparse? [ Gift : Animated Search Engine ...
Read more >
Sub-subcommands (nested commands) · Issue #127 - GitHub
Quick question: Is it possible to define a subcommand of a subcommand ?
Read more >
Executing Nested Commands - Clikt
Clikt supports arbitrarily nested commands. You can add one command as a child of another with the subcommands function, which can be called...
Read more >
clap_nested - Rust - Docs.rs
clap-nested provides a convenient way for setting up CLI apps with multi-level subcommands. We all know that clap really shines when it comes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found