DOC: Improve the description of the `dtype` parameter in `numpy.array` docstring
See original GitHub issueThe description of the dtype
parameter in numpy.array
docstring looks as follows:
dtype : data-type, optional
The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to ‘upcast’ the array. For downcasting, use the .astype(t) method.
which is rather misleading, because in reality the behavior is different. For example for integers the type chain on Windows is (int32
-> int64
-> uint64
-> object
), generally it starts with np.int_
. Also, for example, there is no np.uint32
in this chain, and also there is no np.int8
and etc.
One more thought, this behavior is also inconsistent with the following snippet:
>>> (np.array(1, np.int32) + np.array(1, np.uint64)).dtype
dtype('float64') #instead of dtype('O')
Thus in np.array
numpy
follows Python rules and in expression it follows C’s rules. But it also follows deliberately:
>>> (np.array([0x7fffffff]) + np.array([0x7fffffff])).dtype
dtype('int32') #instead of dtype('uint32')
Another moment that by default the dtype
will be numpy.float64
instead of np.int_
which contradicts with the description. Related issue is #10405.
I don’t know what is the right way to resolve this tangle. I opened this issue here because it seems that there is no interest in discussing this on numpy’s ML.
p.s.: one more point, is there any real benefit to define np.int_
as C’s long (instead of 8 byte on 64-bit and 4 byte on 32-bit), with the following differences on Windows 64-bit and other OSs. As for me, I’ve never did meet the advantages of this choice, only a few inconveniences, and the need to provide dtype=
everywhere.
Issue Analytics
- State:
- Created 6 years ago
- Comments:17 (15 by maintainers)
The documentation of the dtype kwarg still states
I wish we could write “If not given, then the type will be determined according to internal arbitrary rules that may change between versions.” In practice it would be good to link to a part of a NEP or other documentation that describes the algorithm used.
This is plain wrong: