linspace with int dtype sometimes doesn't include endpoints #18881

asmeurer · 2021-04-30T22:42:20Z

>>> np.linspace(-9007199254740993, 0, 1, dtype=np.int64)
array([-9007199254740992])
>>> np.linspace(0, 9007199254740993, 2, dtype=np.uint64, endpoint=True)
array([               0, 9007199254740992], dtype=uint64)
>>> np.__version__
'1.21.0.dev0+1420.gc2dd42fda'

(notice that the last digit has been changed from a 3 to a 2)

It's not clear to me if this should be considered a bug, or if this is expected from the way linspace inherently has to do rounding.

I found #16813 which seems related.

charris · 2021-04-30T23:27:08Z

It is going through float64, which has insufficient precision.

In [1]: array(9007199254740993).astype(float64).astype(uint64)                  
Out[1]: array(9007199254740992, dtype=uint64)

Also

In [11]: nextafter(9007199254740992, 9007199254740995)                                                                                  
Out[11]: 9007199254740994.0

rgommers · 2021-05-04T15:58:28Z

This looks like a clear bug, integer input should not use any floating-point intermediate values.

rkern · 2021-05-04T16:28:44Z

linspace() is essentially a floating-point function. Only in rare edge cases is there a pure-integer interpretation. The dtype= argument is just for post-converting the result back to a requested dtype (mostly to retain float32). It's behavior with dtype=int is just a side effect of not explicitly disallowing it.

rkern · 2021-05-04T16:29:07Z

Use arange() for integers, linspace() for floats.

asmeurer · 2021-05-04T20:26:15Z

I've always saw it as "use arange if you want to specify the step size and linspace if you want to specify the number of points", regardless of the output dtype.

rkern · 2021-05-04T22:16:04Z

Well, what people want and what people can have when numerical airthmetic is concerned are often disjoint sets. :-)

Sometimes, the limitations of floating point arithmetic force us to recast what we want (something that usually makes perfect sense with real-number arithmetic like "increments from 0 to 1, inclusive, by steps of 0.1") into something that we can actually compute reliably (linspace(0, 1, 11)).

Fundamentally, I don't think we should have two different paths for linspace() when (stop - start) just happens to be evenly divisible by num and when it doesn't.

rgommers · 2021-05-05T07:57:42Z

Fundamentally, I don't think we should have two different paths for linspace() when (stop - start) just happens to be evenly divisible by num and when it doesn't.

That may be correct, but

The dtype= argument is just for post-converting the result back to a requested dtype (mostly to retain float32).

then this is the problem. A dtype= argument is not supposed to work like that. If everything is a float64 calculation anyway, then having no dtype keyword and using np.linspace(...).astype() would be the clearer way to express that.

On the other hand, there does seem to be a separate complex path:

>>> np.linspace(1+1.j, 4, 5, dtype=np.complex64)
array([1.  +1.j  , 1.75+0.75j, 2.5 +0.5j , 3.25+0.25j, 4.  +0.j  ],
      dtype=complex64)

So there's no real reason for there not to be a separate integer path, where if start/stop/step are integers it would use integer division to calculate step size.

There's probably no reasonable way to change it anymore at this point, but either implementing an integer path or disallowing it would have been better. Like, e.g., this:

>>> np.divide(3, 2)
1.5
>>> np.divide(3, 2, dtype=np.int64)
Traceback (most recent call last):
  File "<ipython-input-27-f407b9a53a54>", line 1, in <module>
    np.divide(3, 2, dtype=np.int64)
TypeError: No loop matching the specified signature and casting was found for ufunc true_divide

eric-wieser · 2021-05-05T08:06:09Z

I doubt calculating a step size with integer division would be desirable - I find it unlikely someone using np.linspace(0, 15, 11, dype=int) would want [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] as their output - you'd need to compute each value separately without going through a step at all.

rkern · 2021-05-05T14:55:05Z

If everything is a float64 calculation anyway, then having no dtype keyword and using np.linspace(...).astype() would be the clearer way to express that.

Yes, and if I had seen that change come across when it happened, I'd have complained then, too.

seberg · 2021-05-05T15:02:26Z

Hmmm, I guess the argument does make sense for float32 and longdouble (although we probably use float64 for a lot of calculations in float32 as well, and I am not sure thats intentional). But really is probably a mistake for ints :(. I would be fine if we would try to deprecate it again for ints, although thats probably also jumping through a bit of hoops.

rkern · 2021-05-05T15:05:52Z

So there's no real reason for there not to be a separate integer path, where if start/stop/step are integers it would use integer division to calculate step size.

I'm not necessarily arguing that there can't be a separate integer path. But what would be needed to get the requested behaviors is a separate path depending on the precise values of stop-start and num, which I don't think is tenable.

It seems like there are three different expectations about what ought to happen when dtype=int. In the original PR, it was assumed that the user was requesting the .astype(int) behavior. In #16813, the expectation was that we'd use floor(). Here, we've expected a pure-integer computation in these evenly-divisible edge cases (but presumably not that in the non-evenly-divisible cases).

Because of this variety of expectations, I'd actually lean towards deprecating np.integer dtypes to make our way back to refusing the temptation to guess.

rgommers · 2021-05-05T18:29:53Z

But what would be needed to get the requested behaviors is a separate path depending on the precise values of stop-start and num, which I don't think is tenable.

That I agree with, very much undesirable. My expectation would be to apply casting rules to all inputs; if they're all integer then just use integer division (i.e. linspace(0, 3, num=6) --> array([0, 0, 1, 1, 2, 2]). But yeah, that's also not great.

Because of this variety of expectations, I'd actually lean towards deprecating np.integer dtypes to make our way back to refusing the temptation to guess.

Yes, that seems like the best option indeed.

rgommers · 2021-11-12T15:04:39Z

Removing the 1.22.0 milestone, doesn't seem critical to include and hasn't moved in >6 months.

asmeurer mentioned this issue Apr 30, 2021

Issues with array creation functions data-apis/array-api#107

Closed

rgommers added 00 - Bug component: numpy._core labels May 4, 2021

charris added this to the 1.22.0 release milestone May 5, 2021

leofang mentioned this issue Sep 8, 2021

Adopt the numpy.array_api module as cupy.array_api cupy/cupy#5698

Merged

honno mentioned this issue Oct 29, 2021

BUG: np.arange() with int dtype can return inprecisely-sized arrays #20226

Open

rgommers removed this from the 1.22.0 release milestone Nov 12, 2021

asmeurer mentioned this issue Sep 29, 2022

Filter out large distances in test_linspace data-apis/array-api-tests#141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linspace with int dtype sometimes doesn't include endpoints #18881

linspace with int dtype sometimes doesn't include endpoints #18881

asmeurer commented Apr 30, 2021

charris commented Apr 30, 2021

rgommers commented May 4, 2021

rkern commented May 4, 2021

rkern commented May 4, 2021

asmeurer commented May 4, 2021

rkern commented May 4, 2021 •

edited

Loading

rgommers commented May 5, 2021

eric-wieser commented May 5, 2021

rkern commented May 5, 2021

seberg commented May 5, 2021

rkern commented May 5, 2021

rgommers commented May 5, 2021

rgommers commented Nov 12, 2021

linspace with int dtype sometimes doesn't include endpoints #18881

linspace with int dtype sometimes doesn't include endpoints #18881

Comments

asmeurer commented Apr 30, 2021

charris commented Apr 30, 2021

rgommers commented May 4, 2021

rkern commented May 4, 2021

rkern commented May 4, 2021

asmeurer commented May 4, 2021

rkern commented May 4, 2021 • edited Loading

rgommers commented May 5, 2021

eric-wieser commented May 5, 2021

rkern commented May 5, 2021

seberg commented May 5, 2021

rkern commented May 5, 2021

rgommers commented May 5, 2021

rgommers commented Nov 12, 2021

rkern commented May 4, 2021 •

edited

Loading