Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reduce parallelism and split validation per language

See original GitHub issue

Today, there is one validation workflow per building block (new quickstarts) and within each there is a multiplication matrix for programming language and protocol/SDK. It helps to avoid validation to fail due to left over state from a previous scenario, but this causes too many jobs to be triggered, exhausting the available workers for other workflows.

Instead, split the workflows per programming language. Within each workflow, multiplex only per protocol and have a loop within the workflow to run validation for each building block. This way, there would still be isolation for each building block but reduced number of jobs triggered.

Summary:

Move from workflow file per building block to workflow per language in .github/workflows
Matrix is only for [sdk, http] (variant)
Building blocks run in sequence in a single step with a loop that understands the folder structure, so when a new building block quickstart is added for a language, it is automatically picked up without requiring code changes:

for building_block in `ls -1`; do # get list of building blocks from folders or from env variable in global.env
  # TODO: add check if folder exists.
  cd $building_block/$language/$variant
  make validate 
  cd ../../../
done

Issue Analytics

State:
Created 7 months ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

paulyukcommented, Feb 23, 2023

Note @ASHIQUEMD is signing on to drive this. Thank you MD!

1reaction

paulyukcommented, Feb 8, 2023

I like it that there’s still some split so that:

one workflow isn’t too long
the isolation prevents 1 flakey issue taking down the whole matrix

The split by Building block has been good so that any new building block we introduce, does not affect the others that are in good shape. We’ll add building blocks more often than adding languages. Also you sort of have to “test in main” with these things, so i think authoring will get harder. but if it makes daily runtime better I’m still good with trying something and optimizing that way.

We should definitely solve too many jobs being triggered.

Top Results From Across the Web

How to speed up nested cross validation in python?

Two things: Instead of GridSearch try using HyperOpt - it's a Python library for serial and parallel optimization. I would reduce ...

Training, Validation, Test Split for Machine Learning Datasets

This will allow you to realistically measure your model's performance by ensuring that the dataset used to train the model and the dataset...

Efficient Large NLP Model Training with 3D Parallelism ...

It applies tensor parallelism to split model layers as depicted in Fig. 2. To reduce the communication time, each tensor parallel group is....

Distributed Deep Learning training: Model and Data ...

Model parallelism: enables us to split our model into different chunks and train each chunk into a different machine.

Deep Learning Frameworks for Parallel and Distributed ...

This third post of this series will explore some fundamental concepts in distributed and parallel Deep Learning training and introduce current deep learning ......