question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Error out when using 'cosine' distance metrics (ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types)

See original GitHub issue

Hi,

I am actually using umap, but i know it is using pynndescent under the hood. When I am running umap with > 10k rows, I get following errors:

numpy.core._exceptions.UFuncTypeError: ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types <class 'numpy.dtype[float32]'> -> None

This is the minimal reproducible codes

import numpy as np
import umap

print(100, umap.UMAP(metric="cosine").fit(np.random.random([100,10])).embedding_.shape)
print(1000, umap.UMAP(metric="cosine").fit(np.random.random([1000,10])).embedding_.shape)
print(10000, umap.UMAP(metric="cosine").fit(np.random.random([10000,10])).embedding_.shape)

image

This is the environment:

python 3.8.2

colorama      0.4.4  Cross-platform colored terminal text.
joblib        1.1.0  Lightweight pipelining with Python functions
llvmlite      0.34.0 lightweight wrapper around basic LLVM functionality
numba         0.51.2 compiling Python code using LLVM
numpy         1.22.0 NumPy is the fundamental package for array computing with Python.
pynndescent   0.5.5  Nearest Neighbor Descent
scikit-learn  1.0.2  A set of python modules for machine learning and data mining
scipy         1.6.1  SciPy: Scientific Library for Python
threadpoolctl 3.0.0  threadpoolctl
tqdm          4.62.3 Fast, Extensible Progress Meter
umap-learn    0.5.2  Uniform Manifold Approximation and Projection

This did not happen in the prev version of my application. I suspect might be due to the new numpy version. However, because i am also using hdbscan, it does not work with any numpy version except 1.22.0.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
jjsnleecommented, Jan 30, 2022

Another option (not having to mess around in an install env) is to do the following somewhere in your own code:

import pynndescent
pynn_dist_fns_fda = pynndescent.distances.fast_distance_alternatives
pynn_dist_fns_fda["cosine"]["correction"] = correct_alternative_cosine
pynn_dist_fns_fda["dot"]["correction"] = correct_alternative_cosine
3reactions
Kydlawcommented, Jan 19, 2022

Hello,

I just took the same bullet.

Environment:

  • Python 3.9.7
  • Numpy 1.22.0
  • Numba 0.53.0
  • Pynndescent 0.5.5
  • Umap 0.5.2

I used a higher version of Numpy as a fix to https://github.com/scikit-learn-contrib/hdbscan/issues/457.

Having that one fixed, I stumbled on this issue. So I tried the fix you suggested in:

[…] In umap/distances.py there is a function definition:

@numba.vectorize(fastmath=True)
def correct_alternative_cosine(d):
    return 1.0 - pow(2.0, -d)

If you change that to

@numba.njit(fastmath=True)
def correct_alternative_cosine(ds):
    result = np.empty_like(ds)
    for i in range(ds.shape[0]):
        result[i] = 1.0 - np.power(2.0, ds[i])
    return result

[…] you can just make this edit in your installed copy of umap in site-packages and have it work.

This change works for me. However, small correction here: the distance definition is not in umap/distances.py but in pynndescent/distances.py.

So, if you are using venv, in .venv/lib/pythonX.X/site-packages/pynndescent/distances.py apply the changes suggested.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What does "ufunc did not contain a loop with signature ...
Though I guess I'm not sure of the argument name in the function either, The input should be an array, I think, not...
Read more >
Pandas : Python: ufunc 'add' did not contain a loop ... - YouTube
Pandas : Python: ufunc 'add' did not contain a loop with signature matching types dtype('S21') dtype('S21') dtype('S21') [ Beautify Your ...
Read more >
Different Types of Distances Used in Machine Learning
Distance functions are often used as error or cost functions to be minimized in an optimization problem.We have often heard the use of...
Read more >
An end-to-end approach for the verification problem
In this contribution, we augment the metric learn- ing setting by introducing a parametric pseudo- distance, trained jointly with the encoder. Sev-.
Read more >
cosine_similarity between 2 pandas df column to get cosine ...
However, even if we did convert it to list of lists, the next problem arises: ... So, let's limit to pairwise comparison, artificially...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found