question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Projecting numpy masked arrays returns plain numpy arrays

See original GitHub issue

While upgrading an older system from pyproj 2.6.1 to the 3.x series I found that when projecting arrays of coordinates (as a numpy masked array) that in pyproj 2.6.1 masked arrays of the projected coordinates were returned, however as of the 3.x series the arrays are no longer masked (i.e. “plain” numpy arrays). I’m not sure if this an intentional change or if this is a regression of some kind. I couldn’t find a GH issue relating to this problem, so I thought I’d let you know about it. I bisected the issue and found that commit where the problem first appears is 4ab3ff7cf2e3ff089509b921f103dd2fe57ddfda.

If you require any more information (beyond that mentioned below), please just let me know and I’ll be more than happy to provide it.

Code Sample

Here’s a sample piece of code which illustrates the problem (referred to as test_projected_masked_arrays.py further below):

# -*- coding: utf-8 -*-

import numpy as np
from pyproj import Proj
from unittest import TestCase


class TestProjectMaskedArrays(TestCase):
    def test_projected_masked_array_is_masked(self):
        lat = np.ma.array(data=[30, 35, 40, 45], mask=[0, 0, 1, 0])
        lon = np.ma.array(data=[0, 5, 10, 15], mask=[0, 0, 1, 0])

        proj = Proj('+ellps=WGS84 +proj=stere +lat_0=75.0 '
                    '+lon_0=-14.0 +x_0=0.0 +y_0=0.0 +no_defs')

        x, y = proj(lon, lat)

        self.assertTrue(hasattr(x, "mask"))
        self.assertTrue(hasattr(y, "mask"))

# vim: expandtab shiftwidth=4 softtabstop=4

Setting up a virtual environment and installing base packages (assuming a Debian-bullseye system with Python 3.9):

$ virtualenv --python=/usr/bin/python3 venv
$ source venv/bin/activate
$ pip install numpy==1.21.6 pytest==7.1.2
$ pip install pyproj==2.6.1 && pytest test_projected_masked_arrays.py  # passes
$ pip install pyproj==3.0.0 && pytest test_projected_masked_arrays.py  # fails
# last good commit
$ pip install --force-reinstall git+https://github.com/pyproj4/pyproj.git@8eb145e13ba8133ba624fa2cc1cbc0a0733d69c0 && pytest test_projected_masked_arrays.py  # passes
# first bad commit
pip install --force-reinstall git+https://github.com/pyproj4/pyproj.git@4ab3ff7cf2e3ff089509b921f103dd2fe57ddfda && pytest test_projected_masked_arrays.py  # fails

Problem description

My expectation is that the masked array behaviour from the 2.x pyproj series would continue (this could be an incorrect expectation; I’m not sure!). Specifically, i expect that projecting masked arrays would return masked arrays.

Expected Output

I would expect that after executing

x, y = proj(lon, lat)

where lon and lat are numpy masked arrays, that x and y would have the mask attribute. I.e. that both hasattr(x, "mask") and hasattr(y, "mask") return True.

Environment Information

  • Output from: python -m pyproj -v
pyproj info:
    pyproj: 3.0.dev0
      PROJ: 7.2.1
  data dir: /usr/share/proj

System:
    python: 3.9.2 (default, Feb 28 2021, 17:03:44)  [GCC 10.2.1 20210110]
executable: /venv/bin/python
   machine: Linux-5.10.0-0.bpo.8-amd64-x86_64-with-glibc2.31

Python deps:
       pip: 20.3.4
setuptools: 44.1.1
    Cython: None
  • PROJ version (python -c "import pyproj; print(pyproj.proj_version_str)")
7.2.1
  • PROJ data directory (python -c "import pyproj; print(pyproj.datadir.get_data_dir())")
/usr/share/proj
  • Python version (python -c "import sys; print(sys.version.replace('\n', ' '))")
3.9.2 (default, Feb 28 2021, 17:03:44)  [GCC 10.2.1 20210110]
  • Operation System Information (python -c "import platform; print(platform.platform())")
Linux-5.10.0-0.bpo.8-amd64-x86_64-with-glibc2.31

Installation method

Via pip in a virtual environment.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
paultcochranecommented, Jul 10, 2022

After a bit more hunting I found that looking for the mask attribute might not be clear enough: it seems that using isinstance() and comparing the output array from _copytobuffer() with numpy.ma.MaskedArray is a clearer description of the expectation.

For instance, with pyproj 2.6.1 it’s possible to show that _copytobuffer() returns a masked array from being given a masked array as argument:

from pyproj.utils import _copytobuffer
import numpy

in_arr = numpy.ma.array([1])
out_arr = _copytobuffer(in_arr)

isinstance(out_arr[0], numpy.ma.MaskedArray)  # => True

In pyproj 3.x the isinstance() call returns False.

To cut a long story short: I’ll add a test along these lines 😃

1reaction
snowman2commented, Jul 9, 2022

Thanks 👍. I recommend adding a test with a masked array in this file: https://github.com/pyproj4/pyproj/blob/main/test/test_utils.py

Read more comments on GitHub >

github_iconTop Results From Across the Web

The numpy.ma module — NumPy v1.24 Manual
A masked array is the combination of a standard numpy.ndarray and a mask. A mask is either nomask , indicating that no value...
Read more >
numpy.unique on masked arrays · Issue #16972 - GitHub
I simply run np.unique(return_inverse=True) on the masked array. The output is two arrays: a masked key array with unique entries which ...
Read more >
How to create arrays of missing data - Awkward Array
The ak.from_numpy() function converts masked arrays into Awkward Arrays with missing values, as does the ak.Array constructor.
Read more >
NumPy Tutorial: Your First Steps Into Data Science in Python
This is the method recommended by the NumPy project, especially if you're ... In this case, NumPy adds the scalar to each item...
Read more >
dask.array.ma.masked_array
An array class with possibly masked values. This docstring was copied from numpy.ma.masked_array. Some inconsistencies with the Dask version may exist.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found