question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Small and medium sized dataset suitability for Ax, with GPU computing. Is 15000 too many?

See original GitHub issue

I am using ~15000 samples across 8 dimensions (15000 x 8). Incidentally, I am also using Raytune per the Ax tutorial. New samples are somewhat expensive, around 20 min to 20 hrs per simulation based on the chosen parameters. Hoping to do somewhere between 100-1000 iterations of adaptive design, and max_parallel of somewhere between 8 and 12, based on number of CPUs available.

Does this seem feasible with consumer hardware (e.g. RTX 2060-Ti) and less than a week of runtime?

Will probably do some forecasting to see how the runtime scales with number of training points for this problem and use that to assess whether or not to switch to a genetic algorithm for this particular problem (or downselect training points to a more manageable size). Figured it was worth asking if anyone has had experience running with over 10000 initialized points.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:15 (15 by maintainers)

github_iconTop GitHub Comments

2reactions
sgbairdcommented, Apr 1, 2022

@Balandat thanks for this! This looks pretty doable with the tutorial in https://botorch.org/tutorials/custom_botorch_model_in_ax (swapping out the kernel, as described in the link you mentioned). Perhaps there’s a simpler way of swapping out the kernel as well. Will also give this a try!

Can’t have high enough praise for how responsive and helpful everyone has been over the last few months.

1reaction
sgbairdcommented, Apr 1, 2022

@saitcakmak tried out your suggestion, and that certainly seems to reduce the memory consumption. I’m still planning to give KeOps a try.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to correctly select a sample from a huge dataset in ...
Choosing a small, representative dataset from a large population can improve model training reliability. In machine learning, we often need to ...
Read more >
A Full Hardware Guide to Deep Learning - Tim Dettmers
In this guide I analyse hardware from CPU to SSD and their impact on performance for deep learning so that you can choose...
Read more >
GPU Database Global Market Report 2022: Increasing ...
The GPU database market size was valued at US$ 1,112.11 million by 2028 ... Furthermore, small and medium enterprises in Asia Pacific have ......
Read more >
A Load-Balancing Workload Distribution Scheme for Three- Body ...
The workload distribution scheme is particularly suitable for the GPU's ... GPU implementation of Barnes-Hut algorithm on large data sets in astrophysics ...
Read more >
7 Ways to Handle Large Data Files for Machine Learning
Exploring and applying machine learning algorithms to datasets that are too large to fit into memory is pretty common. This leads to questions ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found