question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

non_zero_dists might be size 0

See original GitHub issue

This does not seem to be an issue when the function is annotated with

@numba.njit(
    fastmath=True
) 

But when removing it, it will sometimes throw:

rho[i] = interpolation * non_zero_dists[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

This happens when non_zero_dists = ith_distances[ith_distances > 0.0] will return no indices for numpy to select. Example: 2 points, n=2, local_connectivity=0.0

https://github.com/lmcinnes/umap/blob/d5d995625b5cfe55430771e4ec2f044533da7b4c/umap/umap_.py#L118

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:15 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jc-healycommented, Aug 28, 2019

I’ve got the dense case finished and the few unit tests I’ve written seems to be working well. The next step is to get the sparse one finished (more unit test) and a then the detection heuristics.

On Wed, Aug 28, 2019 at 8:14 AM Leland McInnes notifications@github.com wrote:

No – by that issue I meant, for example, having x points that are all identical, and an n_neighbors of less than x. This also lines up with related issues that can occur if you have a large number of points all tied for the same distance away from a given point which can cause issue for NNDescent. Ideally at a high level we want to relatively cheaply test for such (pathological) cases and then either warn or provide an optional work-around. The solution to identical points is potentially to simply factor them out, embed the unique points, and then add the duplicates to the embedding as duplicates of the embedded points; this is the behaviour some users would like to see. The catch is that that is expensive to compute (dupe finding is hard in large datasets), potentially unnecessary for most cases, and has slightly surprising behaviour if you aren’t expecting it. The goal was to add an option to do duplicate handling, and the ability to throw a warning and suggest you try duplicate handling if potential issues are found. I have a colleague who was working on getting all of that done.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lmcinnes/umap/issues/284?email_source=notifications&email_token=AC3IUWXWRBRGCFV6HPHT6JTQGZTZXA5CNFSM4IRAOT7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5K5BHQ#issuecomment-525717662, or mute the thread https://github.com/notifications/unsubscribe-auth/AC3IUWVL52K2K6ERAUWL5MDQGZTZXANCNFSM4IRAOT7A .

0reactions
sleighsoftcommented, Sep 12, 2019

@lmcinnes Coming back to support for equidistant nearest neighbors. I have added a rough implementation and a small example in https://github.com/lmcinnes/umap/tree/equidistant_neighbors

When you run devel/test.py it will print the mean distance across multiple runs of the center point [0,0] to its neighbors which are all the same distance away from [0,0]. Running this on the master branch code will yield a higher on average distance of points around [0,0] after embedding with UMAP than it does on the equidistant branch.

Points are aligned like this

      x
      |
x --- o --- x
      |
      x

Where all x have a distance of 1 to o.

Read more comments on GitHub >

github_iconTop Results From Across the Web

When a Size 0 Isn't Really 0: The Psychology Behind Top ...
A Lanvin size zero, with Alber Elbaz's soft, feminine, flowy cuts, would fit on a regular U.S. size 4 or 6. Though Italians...
Read more >
umap-js/umap.ts at main · PAIR-code/umap-js - GitHub
embeddings. If None is specified a value will be selected based on. * the size of the input dataset (200 for large datasets,...
Read more >
Size zero - Wikipedia
Size zero or size 0 is a women's clothing size in the US catalog sizes system. Size 0 and 00 ... Size 00...
Read more >
The Johnson-Lindenstrauss bound for ... - Scikit-learn
The Johnson-Lindenstrauss lemma states that any high dimensional dataset can be randomly projected into a lower dimensional Euclidean space while ...
Read more >
66_random_projections
johnson_lindenstrauss_min_dim estimates the minimal size of the random subspace ... eps = max distortion rate per johnson-lindenstrauss lemma, [0..1] from ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found