[Question] reusing existing experiment data.
See original GitHub issueThe data I’m trying to reuse is from orthogonal search subspaces, e.g. the complete search consists of [h1,h2,h3,h4], and the data I want to reuse is the result of searching in [h1,h4] and [h2,h3].
I attached all the existing data (>50 for each subspace) to a GPEI model and used it to generate new trials. Here are my questions.
- Is this way of reusing data encouraged? If not, what is needed to achieve this goal?
- When generating new trials, sometimes I get errors like
NaNs encounterd when trying to perform matrix-vector multiplication
and sometimes it successed. I think it may be related to the random seed of the bayesian model, is that right? How can I avoid this? - Training and visualizing w/ the reduced hparam/metric set is OK. However, when visualizing w/ the full hparam/metric set using existing data, it seems that I have to generate one trial with the bayesian model once. Is there any way to let the model be aware of the attached data w/o generating the next trial(which I think is memory consuming)?
Thanks!
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Uses and Reuses of Scientific Data: The Data Creators ...
Our overarching research question asks what kinds of data reuse are made possible by access to public archives of scientific data and what...
Read more >On the Reuse of Scientific Data - Data Science Journal
Lastly we propose six research questions on data reuse worthy of pursuit by the community: How can uses of data be distinguished from...
Read more >Data Reuse Stories. Some concrete cases involving several ...
Data can be reused by others, for example, to ask different questions or to enrich them and build further applications or new products....
Read more >An examination of data reuse practices within highly cited ...
The project goal was to better understand data reuse in practice and to explore if research data from an initial publication was reused...
Read more >Understanding the process of data reuse: An extensive review
This study focuses on the general nature, occurrence patterns, and participating elements of data reuse and poses the following questions: RQ1.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@showgood163
If adding additional dimensions to your search space a single point on that dimension from past data doesn’t allow the model to estimate a Gaussian Process lengthscale for that dimension (likely what leads to numerical errors).
The right thing to do here would be to do a small re-initialization of the entire new space and then re-adding the old data.
If extending a dimension that’s already covers a range, then starting with the past data directly can work but initializing with a sobol search in the full new SearchSpace is always the most stable solution.
@showgood163, didn’t mean to belabor what you already knew, was just answering the question of whether it’s possible to let the model be aware of the attached data w/out generating more trials : )
Also, just to make sure –– you needed to do
fetch_data
with anExperiment
(which you retrieved fromAxClient
, not aSimpleExperiment
, correct? ForSimpleExperiment
,eval
is the correct function.