Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Small and medium sized dataset suitability for Ax, with GPU computing. Is 15000 too many?

See original GitHub issue

I am using ~15000 samples across 8 dimensions (15000 x 8). Incidentally, I am also using Raytune per the Ax tutorial. New samples are somewhat expensive, around 20 min to 20 hrs per simulation based on the chosen parameters. Hoping to do somewhere between 100-1000 iterations of adaptive design, and max_parallel of somewhere between 8 and 12, based on number of CPUs available.

Does this seem feasible with consumer hardware (e.g. RTX 2060-Ti) and less than a week of runtime?

Will probably do some forecasting to see how the runtime scales with number of training points for this problem and use that to assess whether or not to switch to a genetic algorithm for this particular problem (or downselect training points to a more manageable size). Figured it was worth asking if anyone has had experience running with over 10000 initialized points.

Issue Analytics

State:
Created a year ago
Comments:15 (15 by maintainers)

Top GitHub Comments

2reactions

sgbairdcommented, Apr 1, 2022

@Balandat thanks for this! This looks pretty doable with the tutorial in https://botorch.org/tutorials/custom_botorch_model_in_ax (swapping out the kernel, as described in the link you mentioned). Perhaps there’s a simpler way of swapping out the kernel as well. Will also give this a try!

Can’t have high enough praise for how responsive and helpful everyone has been over the last few months.

1reaction

sgbairdcommented, Apr 1, 2022

@saitcakmak tried out your suggestion, and that certainly seems to reduce the memory consumption. I’m still planning to give KeOps a try.

Top Results From Across the Web

How to correctly select a sample from a huge dataset in ...

Choosing a small, representative dataset from a large population can improve model training reliability. In machine learning, we often need to ...

A Full Hardware Guide to Deep Learning - Tim Dettmers

In this guide I analyse hardware from CPU to SSD and their impact on performance for deep learning so that you can choose...

GPU Database Global Market Report 2022: Increasing ...

The GPU database market size was valued at US$ 1,112.11 million by 2028 ... Furthermore, small and medium enterprises in Asia Pacific have ......

A Load-Balancing Workload Distribution Scheme for Three- Body ...

The workload distribution scheme is particularly suitable for the GPU's ... GPU implementation of Barnes-Hut algorithm on large data sets in astrophysics ...

7 Ways to Handle Large Data Files for Machine Learning

Exploring and applying machine learning algorithms to datasets that are too large to fit into memory is pretty common. This leads to questions ......