Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

spectral_embedding tests fail on 64-bit Little Endian PowerPC (ppc64le)

See original GitHub issue

On a Debian buildd host for 64-bit Little Endian PowerPC (ppc64le, which Debian calls ppc64el), three tests (two of them recently added with commit https://github.com/scikit-learn/scikit-learn/commit/e52e9c8d7536b6315da655164951060642a52707) involving spectral_embedding() fail:

sklearn/cluster/tests/test_spectral.py::test_precomputed_nearest_neighbors_filtering FAILED
sklearn/manifold/tests/test_spectral_embedding.py::test_precomputed_nearest_neighbors_filtering FAILED
sklearn/tests/test_common.py::test_estimators[SpectralEmbedding()-check_pipeline_consistency] FAILED

In contrast, on 64-bit Big Endian PowerPC (ppc64) all of the tests pass (build log), so is this perhaps an endianness issue?

From the build log on the failing ppc64el:

=================================== FAILURES ===================================
_________________ test_precomputed_nearest_neighbors_filtering _________________

    def test_precomputed_nearest_neighbors_filtering():
        # Test precomputed graph filtering when containing too many neighbors
        X, y = make_blobs(n_samples=200, random_state=0,
                          centers=[[1, 1], [-1, -1]], cluster_std=0.01)
    
        n_neighbors = 2
        results = []
        for additional_neighbors in [0, 10]:
            nn = NearestNeighbors(
                n_neighbors=n_neighbors + additional_neighbors).fit(X)
            graph = nn.kneighbors_graph(X, mode='connectivity')
            labels = SpectralClustering(random_state=0, n_clusters=2,
                                        affinity='precomputed_nearest_neighbors',
                                        n_neighbors=n_neighbors).fit(graph).labels_
            results.append(labels)
    
>       assert_array_equal(results[0], results[1])
E       AssertionError: 
E       Arrays are not equal
E       
E       Mismatch: 49.5%
E       Max absolute difference: 1
E       Max relative difference: 1.
E        x: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E              0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
E              0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
E        y: array([1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1,
E              0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,
E              1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,...

sklearn/cluster/tests/test_spectral.py:122: AssertionError
_________________ test_precomputed_nearest_neighbors_filtering _________________

    def test_precomputed_nearest_neighbors_filtering():
        # Test precomputed graph filtering when containing too many neighbors
        n_neighbors = 2
        results = []
        for additional_neighbors in [0, 10]:
            nn = NearestNeighbors(
                n_neighbors=n_neighbors + additional_neighbors).fit(S)
            graph = nn.kneighbors_graph(S, mode='connectivity')
            embedding = SpectralEmbedding(random_state=0, n_components=2,
                                          affinity='precomputed_nearest_neighbors',
                                          n_neighbors=n_neighbors
                                          ).fit(graph).embedding_
            results.append(embedding)
    
>       assert_array_equal(results[0], results[1])
E       AssertionError: 
E       Arrays are not equal
E       
E       Mismatch: 100%
E       Max absolute difference: 0.23411947
E       Max relative difference: 763.22760149
E        x: array([[-0.030262,  0.035582],
E              [ 0.032586, -0.007568],
E              [-0.033157,  0.033655],...
E        y: array([[ 0.040137, -0.037808],
E              [-0.004603, -0.024269],
E              [ 0.032301, -0.00754 ],...

sklearn/manifold/tests/test_spectral_embedding.py:159: AssertionError
_______ test_estimators[SpectralEmbedding()-check_pipeline_consistency] ________

estimator = SpectralEmbedding(affinity='nearest_neighbors', eigen_solver=None, gamma=None,
                  n_components=2, n_jobs=None, n_neighbors=None,
                  random_state=None)
check = functools.partial(<function check_pipeline_consistency at 0x7fff84bd4ee0>, 'SpectralEmbedding')

    @parametrize_with_checks(_tested_estimators())
    def test_estimators(estimator, check):
        # Common tests for estimator instances
        with ignore_warnings(category=(FutureWarning,
                                       ConvergenceWarning,
                                       UserWarning, FutureWarning)):
            _set_checking_parameters(estimator)
>           check(estimator)

sklearn/tests/test_common.py:101: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sklearn/utils/_testing.py:327: in wrapper
    return fn(*args, **kwargs)
sklearn/utils/estimator_checks.py:1285: in check_pipeline_consistency
    assert_allclose_dense_sparse(result, result_pipe)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = array([[ 9.81790160e-02, -1.68785957e-15],
       [ 1.59037212e-01, -9.43588019e-02],
       [ 1.59037212e-01,  2.6359....81790160e-02,  2.94383302e-16],
       [ 9.81790160e-02,  3.86709855e-16],
       [ 1.59037212e-01, -2.12784505e-01]])
y = array([[-1.34083137e-01, -1.51153284e-15],
       [ 6.32880953e-02, -9.43588019e-02],
       [ 6.32880953e-02,  2.6359....34083137e-01, -4.71346606e-16],
       [-1.34083137e-01,  3.63755266e-16],
       [ 6.32880953e-02, -2.12784505e-01]])
rtol = 1e-07, atol = 1e-09, err_msg = ''

    def assert_allclose_dense_sparse(x, y, rtol=1e-07, atol=1e-9, err_msg=''):
        """Assert allclose for sparse and dense data.
    
        Both x and y need to be either sparse or dense, they
        can't be mixed.
    
        Parameters
        ----------
        x : array-like or sparse matrix
            First array to compare.
    
        y : array-like or sparse matrix
            Second array to compare.
    
        rtol : float, optional
            relative tolerance; see numpy.allclose
    
        atol : float, optional
            absolute tolerance; see numpy.allclose. Note that the default here is
            more tolerant than the default for numpy.testing.assert_allclose, where
            atol=0.
    
        err_msg : string, default=''
            Error message to raise.
        """
        if sp.sparse.issparse(x) and sp.sparse.issparse(y):
            x = x.tocsr()
            y = y.tocsr()
            x.sum_duplicates()
            y.sum_duplicates()
            assert_array_equal(x.indices, y.indices, err_msg=err_msg)
            assert_array_equal(x.indptr, y.indptr, err_msg=err_msg)
            assert_allclose(x.data, y.data, rtol=rtol, atol=atol, err_msg=err_msg)
        elif not sp.sparse.issparse(x) and not sp.sparse.issparse(y):
            # both dense
>           assert_allclose(x, y, rtol=rtol, atol=atol, err_msg=err_msg)
E           AssertionError: 
E           Not equal to tolerance rtol=1e-07, atol=1e-09
E           
E           Mismatch: 50%
E           Max absolute difference: 0.31393279
E           Max relative difference: 7.46490707
E            x: array([[ 9.817902e-02, -1.687860e-15],
E                  [ 1.590372e-01, -9.435880e-02],
E                  [ 1.590372e-01,  2.635916e-01],...
E            y: array([[-1.340831e-01, -1.511533e-15],
E                  [ 6.328810e-02, -9.435880e-02],
E                  [ 6.328810e-02,  2.635916e-01],...

From the same build log, here is the output of show_versions():

System:
    python: 3.8.2rc1 (default, Feb 11 2020, 15:26:48)  [GCC 9.2.1 20200203]
executable: /usr/bin/python3.8
   machine: Linux-4.19.0-8-powerpc64le-ppc64le-with-glibc2.29

Python dependencies:
       pip: None
setuptools: 44.0.0
   sklearn: 0.22.1
     numpy: 1.17.4
     scipy: 1.3.3
    Cython: 0.29.14
    pandas: 0.25.3
matplotlib: 3.1.2
    joblib: 0.14.0

Issue Analytics

State:
Created 3 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

2reactions

ckastnercommented, Jan 4, 2022

Well, the easy part is confirming that it’s an issue with our scipy build:

>>> from scipy import sparse
>>> sparse.rand(10, 3, density=0.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/scipy/sparse/construct.py", line 868, in rand
    return random(m, n, density, format, dtype, random_state)
  File "/usr/lib/python3/dist-packages/scipy/sparse/construct.py", line 813, in random
    return coo_matrix((vals, (i, j)), shape=(m, n)).asformat(format,
  File "/usr/lib/python3/dist-packages/scipy/sparse/coo.py", line 196, in __init__
    self._check()
  File "/usr/lib/python3/dist-packages/scipy/sparse/coo.py", line 283, in _check
    raise ValueError('row index exceeds matrix dimensions')
ValueError: row index exceeds matrix dimensions

0reactions

glemaitrecommented, Feb 14, 2022

Thanks a lot, @ckastner for the feedback. One can close for now.