Reduce parallelism and split validation per language
See original GitHub issueToday, there is one validation workflow per building block (new quickstarts) and within each there is a multiplication matrix for programming language and protocol/SDK. It helps to avoid validation to fail due to left over state from a previous scenario, but this causes too many jobs to be triggered, exhausting the available workers for other workflows.
Instead, split the workflows per programming language. Within each workflow, multiplex only per protocol and have a loop within the workflow to run validation for each building block. This way, there would still be isolation for each building block but reduced number of jobs triggered.
Summary:
- Move from workflow file per building block to workflow per language in
.github/workflows
- Matrix is only for [sdk, http] (variant)
- Building blocks run in sequence in a single step with a loop that understands the folder structure, so when a new building block quickstart is added for a language, it is automatically picked up without requiring code changes:
for building_block in `ls -1`; do # get list of building blocks from folders or from env variable in global.env
# TODO: add check if folder exists.
cd $building_block/$language/$variant
make validate
cd ../../../
done
Issue Analytics
- State:
- Created 7 months ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
How to speed up nested cross validation in python?
Two things: Instead of GridSearch try using HyperOpt - it's a Python library for serial and parallel optimization. I would reduce ...
Read more >Training, Validation, Test Split for Machine Learning Datasets
This will allow you to realistically measure your model's performance by ensuring that the dataset used to train the model and the dataset...
Read more >Efficient Large NLP Model Training with 3D Parallelism ...
It applies tensor parallelism to split model layers as depicted in Fig. 2. To reduce the communication time, each tensor parallel group is....
Read more >Distributed Deep Learning training: Model and Data ...
Model parallelism: enables us to split our model into different chunks and train each chunk into a different machine.
Read more >Deep Learning Frameworks for Parallel and Distributed ...
This third post of this series will explore some fundamental concepts in distributed and parallel Deep Learning training and introduce current deep learning ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Note @ASHIQUEMD is signing on to drive this. Thank you MD!
I like it that there’s still some split so that:
The split by Building block has been good so that any new building block we introduce, does not affect the others that are in good shape. We’ll add building blocks more often than adding languages. Also you sort of have to “test in main” with these things, so i think authoring will get harder. but if it makes daily runtime better I’m still good with trying something and optimizing that way.
We should definitely solve too many jobs being triggered.