question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot use random projection trees

See original GitHub issue

Problem description:

My dataset cannot be represented as a N_INSTANCES x N_DIMENSIONS table of floats: the instances are rows from the main table in a relational database.

Thus, my custom distance measure takes the ids of two instances (i1 and i2 in the code below) as its arguments, and computes the distance between the corresponding instances by accessing data from many tables (imagine computing the distance between two movies, given the actors that appeared in the movies, the genres of the movies, etc.).

This is why random projection trees cannot help: randomly dividing instances according to their ids does not make sense.

This is why I use pynndescent.NNDescent(..., tree_init=False). However, when index.prepare() is called, pynndescent again tries to grow some trees and the minimal working example (below) raises an error (no hyper planes found).

An ugly solution:

It turns out that providing examples as [id, id, ..., id] instead of [id] resolves the issue (the more examples we have, the more copies of the id are necessary), but I still think that the behaviour of the prepare() should be considered a bug.

A possible solution [needs to be implemented]:

Maybe we could start the search in a random node, or the closest among random k nodes, or k random nodes.

Traceback:

Traceback (most recent call last):
  File "C:/Users/.../fast_nn/mwe.py", line 31, in <module>
    index.prepare()
  File "C:\Users\...\pynnd\lib\site-packages\pynndescent\pynndescent_.py", line 1555, in prepare
    self._init_search_graph()
  File "C:\Users\...\pynnd\lib\site-packages\pynndescent\pynndescent_.py", line 963, in _init_search_graph
    for tree in rp_forest
  File "C:\Users\...\pynnd\lib\site-packages\pynndescent\pynndescent_.py", line 963, in <listcomp>
    for tree in rp_forest
  File "C:\Users\...\pynnd\lib\site-packages\pynndescent\rp_trees.py", line 1158, in convert_tree_format
    hyperplane_dim = dense_hyperplane_dim(tree.hyperplanes)
  File "C:\Users\...\pynnd\lib\site-packages\pynndescent\rp_trees.py", line 1140, in dense_hyperplane_dim
    raise ValueError("No hyperplanes of adequate size were found!")
ValueError: No hyperplanes of adequate size were found!

Minimal (not)working example:

import pynndescent
import numpy as np
import numba

np.random.seed(1234)


N_INSTANCES = 100
DATA = np.random.random(N_INSTANCES).astype(np.float32)


@numba.njit(
    numba.types.float32(
        numba.types.Array(numba.types.float32, 1, "C", readonly=True),
        numba.types.Array(numba.types.float32, 1, "C", readonly=True),
    ),
    fastmath=True,
    locals={
        "i1": numba.types.int64,
        "i2": numba.types.int64,
    }
)
def my_metric(i1_array, i2_array):
    i1 = numba.int64(i1_array[0])
    i2 = numba.int64(i2_array[0])
    return np.abs(DATA[i1] - DATA[i2])


xs = np.array([[i for _ in range(1)] for i in range(N_INSTANCES)]).astype(np.float32)
index = pynndescent.NNDescent(xs, n_jobs=1, metric=my_metric, n_neighbors=1, tree_init=False)
index.prepare()
neighbors, distances = index.query(xs)

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
lmcinnescommented, May 20, 2022

With regard to the changes you are proposing here. They seem to make sense, but I admit (without line references) I don’t entirely follow exactly what you are proposing. I would certainly welcome a PR with these changes if you can manage it.

1reaction
lmcinnescommented, May 6, 2022

Ah yes, I think I see what the problem there would. I think I can fix that; hopefully I’ll get to it in the next few days.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Random Projection Trees Revisited
The Random Projection Tree (RPTREE) structures proposed in [1] are space par- titioning data structures that automatically adapt to various notions of ...
Read more >
Random projection trees and low dimensional ... - UCSD CSE
Suppose an RPTree-Max is built using data set. S ⊂ RD. Pick any cell C in the RP tree; suppose that. S ∩...
Read more >
Random projection trees and low dimensional manifolds
We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data without having to ...
Read more >
(PDF) Random Projection Trees Revisited - ResearchGate
The Random Projection Tree structures proposed in [Freund-Dasgupta STOC08] are space partitioning data structures that automatically adapt ...
Read more >
Random Projection Trees Revisited - CSE - IIT Kanpur
The Random Projection Tree (RPTREE) structures proposed in [1] are space par- ... Such a size reduction is of immense use in vector...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found