Recursion Error (Different from Previous Post)
See original GitHub issueEDIT: Is this just a version issue? RecursionError was RuntimeError before python 3.5.
I’m working with a large data set (~1,700,000 x ~400), and I’m getting the following error:
Traceback (most recent call last): File "umapping.py", line 25, in <module> u = umap.UMAP(metric="correlation").fit_transform(data) File "/users/nicolerg/anaconda2/lib/python2.7/site-packages/umap/umap_.py", line 1573, in fit_transform self.fit(X) File "/users/nicolerg/anaconda2/lib/python2.7/site-packages/umap/umap_.py", line 1534, in fit self.verbose File "/users/nicolerg/anaconda2/lib/python2.7/site-packages/umap/umap_.py", line 559, in rptree_leaf_array except RecursionError: NameError: global name 'RecursionError' is not defined
This is the relevant bit of code:
data = ps.read_csv(args.input, compression="gzip", header=1, sep=',') nrows = len(data) colors = np.random.rand(nrows, 3) # RGB colors u = umap.UMAP(metric="correlation", n_neighbors=25).fit_transform(data)
I increased n_neighbors from the default of 15 to 25 to see if that would help, but I got the same error. I do not expect that I have equivalent rows in my data. I am trying to cluster the ~1,700,000 instances in ~400 dimensions. Any suggestions?
Issue Analytics
- State:
- Created 6 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
I wanted to provide an update on this – I know you have moved on to other issues, but just in case you run into another project where UMAP might be useful…
I believe this issue has finally been resolved. It proved remarkably tricky to get to the bottom of, but was, in the end, a code bug in the SGD optimization of the layout. The latest master branch (v0.2.0+) has a new SGD optimization layout algorithm that does not encounter this issue and should produce much better looking embeddings for large datasets.
Looks resolved. Closing.