question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What does 'epochs_per_sample', 'epochs_of_next_sample' and 'epochs_per_negative_sample' mean in the`optimize_layout()` function?

See original GitHub issue

Source code in optimize_layout() function: image

According to this page. The gradient is

image

  1. Where is the term Vij in the the source code?
  2. What does epochs_per_sample, epochs_of_next_sample and epochs_per_negative_sample mean in theoptimize_layout() function?

Anyone can give some hint on this? Thanks!

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
lmcinnescommented, Nov 12, 2018

Since UMAP makes use of the sampling based approach employed by LargeVis, the goal is to sample edges proportionally to the weights v_ij. In practice this is done by by drawing and edge every t epochs, where t is chosen according the the relative proportion of v_ij. This is the epochs_per_sample vector, telling you how many epochs between each sampling of a given edge. The epoch_of_next_sample simply keeps track of when to next sample each edge (in the obvious way). Finally epochs_per_negative_sample plays a similar role for the negative sampling.

All of this is really just an optimization trick – it goes a little faster doing it this way (since are already doing the negative sampling, so we are in a sampling style regime)than the more obvious approach. I’m still working to find ways to make the more obvious thing “fast enough” and hope to eventually switch away, as the sampling is not ideal, for exactly the reason you ask this question: it is not obvious to a reader of the code what is going on.

1reaction
jlmelvillecommented, Nov 20, 2018

For the record, the relevant parts of the LargeVis code is at: https://github.com/lferry007/LargeVis/blob/feb8121e8eb9652477f7f564903d189ee663796f/Linux/LargeVis.cpp#L554

you can see the weights being raised to the power of 0.75. Just below that the neg_table is built with these values, which are then sampled from during negative sampling: https://github.com/lferry007/LargeVis/blob/feb8121e8eb9652477f7f564903d189ee663796f/Linux/LargeVis.cpp#L600

neg_table is just a really big array (of length 1e8), filled by the vertex ids, which are repeated proportional to their value in weights.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found