question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Simulation - Deterministic model results in different recall curves when not seeded

See original GitHub issue

The CLI suggests that there are 2 seeds available: one for the model and one for the prior knowledge.

  --seed SEED           Seed for models. Use integer between 0 and 2^32 - 1.
  ...
  --init_seed INIT_SEED
                        Seed for setting the prior indices if the --prior_idx option is not used. If the option --prior_idx is used with one or more index, this option is ignored.

Running a simulation with both seeds will result in identical results.

asreview simulate hall -s result_run_3.h5 --init_seed 1 --seed 1
asreview simulate hall -s result_run_4.h5 --init_seed 1 --seed 1

asreview plot result_run_3.h5 --type 'inclusion' -o plot_run_3.png
asreview plot result_run_4.h5 --type 'inclusion' -o plot_run_4.png

plot_run_4 plot_run_3

The default model is the deterministic NB model. So I thought I could skip the --seed value. However, this results in different results. Any thoughts?

asreview simulate hall -s result_run_1.h5 --init_seed 1
asreview simulate hall -s result_run_2.h5 --init_seed 1

asreview plot result_run_1.h5 --type 'inclusion' -o plot_run_1.png
asreview plot result_run_2.h5 --type 'inclusion' -o plot_run_2.png

plot_run_1 plot_run_2

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
J535D165commented, Oct 1, 2020

Oke, makes sense. Clarification of the documentation will do, isn’t it?

  --seed SEED           Seed for the active learning cycle (classifiers, balance strategies, 
                        and query strategies). Use integer between 0 and 2^32 - 1.
0reactions
J535D165commented, Oct 1, 2020

Any one interested in doing a PR on this? (the docs)

Read more comments on GitHub >

github_iconTop Results From Across the Web

The area under the precision‐recall curve as a performance ...
Abstract Species distribution models are used to study biogeographic patterns and guide decision-making. The variable quality of these ...
Read more >
A Bayesian method for the analysis of deterministic and ...
The posterior probability distribution over model parameters is determined via Monte Carlo sampling. Models are compared using the “cross-validation likelihood” ...
Read more >
Chapter 4 Modeling the experimental data
In this book, we will not try to write models to predict future outcomes for ... empirical fashion: we look at the data...
Read more >
Modeling and Simulation
What Is a Least Squares Model? Many problems in analyzing data involve describing how variables are related. The simplest of all models describing...
Read more >
ROC Curves and Precision-Recall Curves for Imbalanced ...
Each plot can also be summarized with an area under the curve score that can be used to directly compare classification models.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found