question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UMAP Intersections and unions are broken

See original GitHub issue

Hi Leland,

Just wanted to point out that fresh installs of umap seem to not be working entirely correct. The intersection and union methods seem to be the most obvious thing I’ve found, hopefully they are the only method broken. Here is an example following the instructions in the UMAP docs (https://umap-learn.readthedocs.io/en/latest/composing_models.html) on mnist: image

I’ve noticed this on multiple datasets and multiple different fresh conda environments. So either an update to UMAP went astray or a backend dependency has updated and caused some issues. Here is the yaml file I used to create a fresh conda environment:

channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - python>=3.9
  - joblib>=1
  - tbb
  - hdbscan
  - matplotlib
  - numpy
  - seaborn
  - scikit-learn
  - scikit-bio
  - numba=0.53.0
  - pebble
  - biopython
  - pynndescent
  - threadpoolctl
  - imageio
  - umap-learn>=0.5

And here is the code I used to generate the above plot:

import sklearn.datasets
from sklearn.preprocessing import RobustScaler
import seaborn as sns
import pandas as pd
import numpy as np
import umap
import matplotlib.pyplot as plt


mnist = sklearn.datasets.fetch_openml("mnist_784")
top = mnist.data.iloc[:, :28 * 14]
bottom = mnist.data.iloc[:, 28 * 14:]
top_mapper = umap.UMAP(random_state=42).fit(top)
bot_mapper = umap.UMAP(random_state=42).fit(bottom)
intersection_mapper = top_mapper * bot_mapper
mnist_targets = [int(v) for v in mnist.target]
color_palette = sns.color_palette('Paired', max(mnist_targets) + 1)
cluster_colors = [
    color_palette[x] if x >= 0 else (0.5, 0.5, 0.5) for x in mnist_targets
]
fig = plt.figure()
ax = fig.add_subplot(111)

ax.scatter(intersection_mapper.embedding_[:, 0],
           intersection_mapper.embedding_[:, 1],
           s=7,
           linewidth=0,
           c=cluster_colors,
           alpha=0.7)
plt.gca().set_aspect('equal', 'datalim')
plt.show()

I’ve attached the output of conda list as well so you can see all of the packages that have been installed: conda_freeze.txt

Hopefully this is a fairly early fix, but please keep me posted and let me know if I can help in any way.

Cheers, Rhys

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
lmcinnescommented, Nov 30, 2021

Thanks, I suspect it is indeed related to #798. It must be an odd interaction. I’ll see if I can track it down.

0reactions
rhysnewellcommented, Nov 30, 2021

Nice, seems to be fixed on my end. Thanks for the quick turn around on that!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Combining multiple UMAP models - Read the Docs
Combining multiple UMAP models¶. It is possible to combine together multiple UMAP models, assuming that they are operating on the same underlying data....
Read more >
umap Documentation - Read the Docs
Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for.
Read more >
pypi umap - You.com | The Search Engine You Control
In UMAP this is given by the set_op_mix_ratio, where a value of 0.0 represents an intersection, and a value of 1.0 represents a...
Read more >
Union and intersection contracts are hard, actually - Tweag
array contracts; function contracts. Unions and intersections. The existing constructors can get us quite far already, but some common contracts ...
Read more >
UMAP for Data Integration. Graph-based Single Cell Omics…
In the graph space, it is straightforward to find an intersection between individual graphs from individual data sets by keeping edges ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found