Cannot use random projection trees
See original GitHub issueProblem description:
My dataset cannot be represented as a N_INSTANCES x N_DIMENSIONS
table of floats: the instances are rows from the main table in a relational database.
Thus, my custom distance measure takes the ids of two instances (i1
and i2
in the code below) as its arguments, and computes the distance between the corresponding instances by accessing data from many tables (imagine computing the distance between two movies, given the actors that appeared in the movies, the genres of the movies, etc.).
This is why random projection trees cannot help: randomly dividing instances according to their ids does not make sense.
This is why I use pynndescent.NNDescent(..., tree_init=False)
. However, when index.prepare()
is called, pynndescent again tries to grow some trees and the minimal working example (below) raises an error (no hyper planes found).
An ugly solution:
It turns out that providing examples as [id, id, ..., id]
instead of [id]
resolves the issue (the more examples we have, the more copies of the id are necessary), but I still think that the behaviour of the prepare()
should be considered a bug.
A possible solution [needs to be implemented]:
Maybe we could start the search in a random node, or the closest among random k
nodes, or k
random nodes.
Traceback:
Traceback (most recent call last):
File "C:/Users/.../fast_nn/mwe.py", line 31, in <module>
index.prepare()
File "C:\Users\...\pynnd\lib\site-packages\pynndescent\pynndescent_.py", line 1555, in prepare
self._init_search_graph()
File "C:\Users\...\pynnd\lib\site-packages\pynndescent\pynndescent_.py", line 963, in _init_search_graph
for tree in rp_forest
File "C:\Users\...\pynnd\lib\site-packages\pynndescent\pynndescent_.py", line 963, in <listcomp>
for tree in rp_forest
File "C:\Users\...\pynnd\lib\site-packages\pynndescent\rp_trees.py", line 1158, in convert_tree_format
hyperplane_dim = dense_hyperplane_dim(tree.hyperplanes)
File "C:\Users\...\pynnd\lib\site-packages\pynndescent\rp_trees.py", line 1140, in dense_hyperplane_dim
raise ValueError("No hyperplanes of adequate size were found!")
ValueError: No hyperplanes of adequate size were found!
Minimal (not)working example:
import pynndescent
import numpy as np
import numba
np.random.seed(1234)
N_INSTANCES = 100
DATA = np.random.random(N_INSTANCES).astype(np.float32)
@numba.njit(
numba.types.float32(
numba.types.Array(numba.types.float32, 1, "C", readonly=True),
numba.types.Array(numba.types.float32, 1, "C", readonly=True),
),
fastmath=True,
locals={
"i1": numba.types.int64,
"i2": numba.types.int64,
}
)
def my_metric(i1_array, i2_array):
i1 = numba.int64(i1_array[0])
i2 = numba.int64(i2_array[0])
return np.abs(DATA[i1] - DATA[i2])
xs = np.array([[i for _ in range(1)] for i in range(N_INSTANCES)]).astype(np.float32)
index = pynndescent.NNDescent(xs, n_jobs=1, metric=my_metric, n_neighbors=1, tree_init=False)
index.prepare()
neighbors, distances = index.query(xs)
Issue Analytics
- State:
- Created a year ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
With regard to the changes you are proposing here. They seem to make sense, but I admit (without line references) I don’t entirely follow exactly what you are proposing. I would certainly welcome a PR with these changes if you can manage it.
Ah yes, I think I see what the problem there would. I think I can fix that; hopefully I’ll get to it in the next few days.