question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: Asteroid CLI design

See original GitHub issue

Here’s my draft for Asteroid CLI design. I guess it’s a radical change from what we have at the moment…

Let’s discuss only about design here, not implementation. I have already given implementation some thought as well and already have a prototype for some parts of the design, but let’s agree to a design first.

Please don’t be afraid to critise what you don’t like. It is likely that I forgot or did not know of some use cases when coming up with the design.


Design goals

  • Separate models, datasets, and experiments (= a model trained on a dataset) from each other.
  • Deduplicate common code.
  • Provide a consistent and convenient interface for users.

API design

Starting from scratch

Assuming you start with an empty hard disk, and want to train a model from scratch.

Steps:

  • Install Asteroid
  • Create dataset config
  • Create model config
  • Run training
  • Run evaluation

Create dataset config (Download and prepare dataset)

Prepare = Create mixtures, create JSON files, etc.

Download dataset from official URL:

$ asteroid data librimix download
Downloading LibriMix dataset to /tmp/librimix-raw...

Prepare dataset, if necessary. Some datasets don’t need preparation, there the prepare cmd is absent.

$ asteroid data librimix prepare --n-speakers 2 --raw /tmp/librimix-raw --target ~/asteroid-datasets/librimix2
Found LibriMix dataset in /tmp/librimix-raw.
Creating LibriMix2 (16 kHz) in ~/asteroid-datasets/librimix2...  # "prepare" never modifies the raw downloads; always creates a copy.
Wrote dataset config to ~/asteroid-datasets/librimix2/dataset.yml.

Generated dataset.yml:

dataset: "asteroid.data.LibriMix"
n_speakers: 2
train_dir: data/tt
val_dir: data/cv
...
sample_rate: 16000

Pass options to prepare:

$ asteroid data librimix prepare --n-speakers 3 --sample-rate 8000 --raw /tmp/librimix-raw --target ~/asteroid-datasets/librimix3
Found LibriMix dataset in /tmp/librimix-raw.
Creating LibriMix2 (8 kHz) in ~/asteroid-datasets/librimix3...  # "prepare" never modifies the raw downloads; always creates a copy.
Wrote dataset config to ~/asteroid-datasets/librimix3/dataset.yml.

dataset.yml:

dataset: "asteroid.data.LibriMix"
n_speakers: 3
sample_rate: 8000
train_dir: data/tt
val_dir: data/cv
...

Create model config

Models have a separate config from datasets (and from experiments, see below). Create one with configure:

$ asteroid model convtasnet configure > ~/asteroid-models/convtasnet-default.yml
$ asteroid model convtasnet configure --n-filters 1337 > ~/asteroid-models/convtasnet-larger.yml

Generated convtasnet-default.yml:

n_filters: 512
kernel_size: 16
...

Run training

$ asteroid train --model ~/asteroid-models/convtasnet-default.yml --data ~/asteroid-datasets/librimix2/dataset.yml
Saving training parameters to exp/train_convtasnet_exp1/experiment.yml
Training epoch 0/100...

Generated experiment.yml (Experiment = train or eval) contains model info, dataset info, training info:

data:
  # (Copy of dataset.yml)
  dataset: "asteroid.data.librimix"
  n_speakers: 3
  sample_rate: 8000
  train_dir: data/tt
  val_dir: data/cv
  ...
model:
  # (Copy of convtasnet-default.yml)
  model: "asteroid.models.ConvTasNet"
  n_filters: 512
  kernel_size: 16
  ...
training:
  optim:
    optimizer: "adam"
    ...
  batch_size: 5
  max_epochs: 100
  ...

Change model, dataset, or training params in place:

$ asteroid train --model ~/asteroid-models/convtasnet-default.yml --data ~/asteroid-datasets/librimix2/dataset.yml --n-filters 1234 --sample-rate 8000 --batch-size 5 --max-epochs 50
Saving training parameters to exp/train_convtasnet_exp2/experiment.yml
Warning: Resampling dataset to 8 kHz.
Training epoch 0/50...

Continue training from checkpoint:

$ asteroid train --continue exp/train_convtasnet_exp1/
Creating experiment folder exp/train_convtasnet_exp3/...
Saving training parameters to exp/train_convtasnet_exp3/experiment.yml
Continuing training from checkpoint 42.
Training epoch 43/100...

Run evaluation

$ asteroid eval --experiment exp/train_convtasnet_exp3/
Saving training parameters to exp/train_convtasnet_exp4/experiment.yml
Evaluating ConvTasNet on LibriMix2...

Can change training params for eval:

$ asteroid eval --experiment exp/train_convtasnet_exp3/ --batch-size 10
Saving training parameters to exp/eval_convtasnet_exp5/experiment.yml
Evaluating ConvTasNet on LibriMix2...

Eval on different dataset:

$ asteroid eval --experiment exp/train_convtasnet_exp3/ --data ~/asteroid-datasets/wsj0
Saving training parameters to exp/eval_convtasnet_exp6/experiment.yml
Evaluating ConvTasNet on WSJ0...

Starting from pretrained

$ asteroid download-pretrained "mpariente/DPRNN-LibriMix2-2020-08-13"
Downloading DPRNN trained on LibriMix2 to exp/pretrained_dprnn_exp7...
$ ls exp/pretrained_dprnn_exp7
- dprnn_best.pth
- experiment.yml
...

Eval pretrained:

$ asteroid eval --experiment exp/train_convtasnet_exp7/ --data ~/asteroid-datasets/wsj0
Saving training parameters to exp/eval_convtasnet_exp7/experiment.yml
Evaluating DPRNN on WSJ0...

Finetune pretrained on custom dataset:

$ asteroid train --continue exp/pretrained_dprnn_exp7 --data /my/dataset.yml --batch-size 123
...

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12

github_iconTop GitHub Comments

2reactions
popcornellcommented, Aug 13, 2020

We will also do an inference engine I think with the CLI interface smthing like:

$ asteroid infer --experiment exp/train_convtasnet_exp7/ --folder /media/sam/separate --output /folder/separated --window_size 32000

the idea is parsing everything recursively and then using overlap_add to separate every file and save it to an output folder.

0reactions
mparientecommented, Aug 14, 2020

Thanks for the clarification. I do agree with you and cannot wait to see the first implementation !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Command-line interface — asteroid 0.6.1dev documentation
System Message: ERROR/6 (/home/docs/checkouts/readthedocs.org/user_builds/asteroid/checkouts/latest/docs/source/cli.rst, line 21). Command ['asteroid-infer' ...
Read more >
RFC Creating a coherent user and developer ... - GitHub
This RFC aims to look at a summary of them, and try to find ways to enhance end users' and developers' experience.
Read more >
bash - Creating a RFC 1123 compliant date for HTTP header ...
I'm trying to create a HTTP 1/1 compliant date header using standard unix date(1) in order to post this to a RESTful server...
Read more >
Command-line interface — asteroid 0.6.1dev documentation
Path to the wav files to separate. Also supports list of filenames, directory names and globs. -f, --force-overwrite Whether to overwrite output wav...
Read more >
SNMP CLI - Tobias International
These MIBs are defined in publication called RFCs (Request or Comment). The SNMP agent residing on Cisco devices support most of the RFC....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found