question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`Overflow encountered in true_divide error` when using Aligned UMAP

See original GitHub issue

Hi! I’m a relatively new UMAP user, using Aligned UMAP to visualize the results of K-means clustering on a a corpus of text documents across time.

As each time window has a number of documents that can be found in the succeeding time window, I generate a dictionary of relations and also obtain the distances of the documents from one another using the following process:

def get_distance(similarity):
    slice_dist = 1 - similarity # similarity -> numpy array of TFIDF scores
    slice_dist[slice_dist <= 0] = 0
    return slice_dist


def get_relation(from_df, to_df):
    slice1_ids = from_df['ids'].reset_index().drop(['received'], axis=1)
    slice2_ids = to_df['ids'].reset_index().drop(['received'], axis=1)

    shared_ids = list(set(slice2_ids['id'].tolist()) & set(slice1_ids['id'].tolist())) 
    ind1 = slice1_ids[slice1_ids['id'].isin(shared_ids)]
    ind2 = slice2_ids[slice2_ids['id'].isin(shared_ids)]
    
    relation = {}
    index1 = list(ind1.index)
    index2 = list(ind2.index)

    for i, item in enumerate(index1):
        relation[item] = index2[i]
        
    return relation

relations = []

for j, mat in slices.items():
    %time mat['distance'] = get_distance(mat['similarity'])
    
    if j > sliceKeys[0]:
        prev_mat = slices[j-1]
        %time relations.append(get_relation(prev_mat, mat))

distances = [] # Each time slice's distance is added to an array so that I have an array of distances
for j, mat in slices.items():
    distances.append(mat['distance'])

My Aligned UMAP settings are as follows:

%%time
aligned_mapper = umap.AlignedUMAP(n_neighbors=5,
    min_dist=0.05,).fit(distances, relations=relations)

My distances array looks like this: image

Previously this approach gave me no issues. However, I’ve been testing out new results and have been getting the error below over and over.

/Users/bianchi_dy/opt/anaconda3/lib/python3.7/site-packages/umap/spectral.py:256: UserWarning: WARNING: spectral initialisation failed! The eigenvector solver
failed. This is likely due to too small an eigengap. Consider
adding some noise or jitter to your data.

Falling back to random initialisation!
  "WARNING: spectral initialisation failed! The eigenvector solver\n"
/Users/bianchi_dy/opt/anaconda3/lib/python3.7/site-packages/umap/umap_.py:905: RuntimeWarning: overflow encountered in true_divide
  result[n_samples > 0] = float(n_epochs) / n_samples[n_samples > 0]`

and the following traceback, which tells me I’m dividing by zero somewhere I’m not supposed to be?

--------------------
LinAlgErrorTraceback (most recent call last)
<timed exec> in <module>

~/opt/anaconda3/lib/python3.7/site-packages/umap/aligned_umap.py in fit(self, X, y, **fit_params)
    357                     embeddings[-1],
    358                     next_embedding,
--> 359                     np.vstack([left_anchors, right_anchors]),
    360                 )
    361             )

~/opt/anaconda3/lib/python3.7/site-packages/numba/np/linalg.py in _check_finite_matrix()
    751         if not np.isfinite(v.item()):
    752             raise np.linalg.LinAlgError(
--> 753                 "Array must not contain infs or NaNs.")
    754 
    755 

LinAlgError: Array must not contain infs or NaNs.

Any ideas as to what might be causing this error or how to fix it? My suspicion is that it’s to do with distances but I’m not sure if I need to perform some sort of normalization or pre-processing aside from turning TFIDF similarity scores into distances. Unfortunately this error came up the night before a deadline I was intending to use Aligned UMAP for, so it’d be great if anyone could point me in the right direction to solving this even in a hacky way.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
GregDemandcommented, Jun 17, 2022

I’ve fixed this issue with pull request #875. Basically the problem was in umap_.py line 919:

result[n_samples > 0] = float(n_epochs) / n_samples[n_samples > 0]

where the guard part of the statement didn’t match the calculation. The easiest fix was casting n_samples from np.float32 to np.float64 to match the type of result.

result[n_samples > 0] = float(n_epochs) / np.float64(n_samples[n_samples > 0])

This could have alternatively been fixed by refining the guard part of the statement to something like:

result[n_samples/n_epochs > 0] = float(n_epochs) / n_samples[n_samples/n_epochs > 0]

but that solution looks worse.

0reactions
lmcinnescommented, Nov 16, 2021

Thanks for the reproducer. I’ll try to look into this when I get a little time.

Read more comments on GitHub >

github_iconTop Results From Across the Web

numpy: Invalid value encountered in true_divide
I tried doing this with two random arrays (and an arbitrary element in one of the arrays set to 0) - I get...
Read more >
How to use AlignedUMAP — umap 0.5 documentation
There are several ways to go about doing this. One simple approach is to simply embed each dataset with UMAP independently and then...
Read more >
umap Documentation - Read the Docs
Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for.
Read more >
Troubleshooting - Rocket Software Documentation
Found and removed old repository definitions of model DICT, as they are out-dated · Error: Duplicate entity name detected.
Read more >
Bioconductor 3.11 Released
A hybrid approach of global alignment (through MS2 features) and local ... (2.19.1) Improve ERROR message when resource isn't found.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found