Skip to content

linspace with int dtype sometimes doesn't include endpoints #18881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asmeurer opened this issue Apr 30, 2021 · 13 comments
Open

linspace with int dtype sometimes doesn't include endpoints #18881

asmeurer opened this issue Apr 30, 2021 · 13 comments

Comments

@asmeurer
Copy link
Member

>>> np.linspace(-9007199254740993, 0, 1, dtype=np.int64)
array([-9007199254740992])
>>> np.linspace(0, 9007199254740993, 2, dtype=np.uint64, endpoint=True)
array([               0, 9007199254740992], dtype=uint64)
>>> np.__version__
'1.21.0.dev0+1420.gc2dd42fda'

(notice that the last digit has been changed from a 3 to a 2)

It's not clear to me if this should be considered a bug, or if this is expected from the way linspace inherently has to do rounding.

I found #16813 which seems related.

@charris
Copy link
Member

charris commented Apr 30, 2021

It is going through float64, which has insufficient precision.

In [1]: array(9007199254740993).astype(float64).astype(uint64)                  
Out[1]: array(9007199254740992, dtype=uint64)

Also

In [11]: nextafter(9007199254740992, 9007199254740995)                                                                                  
Out[11]: 9007199254740994.0

@rgommers
Copy link
Member

rgommers commented May 4, 2021

This looks like a clear bug, integer input should not use any floating-point intermediate values.

@rkern
Copy link
Member

rkern commented May 4, 2021

linspace() is essentially a floating-point function. Only in rare edge cases is there a pure-integer interpretation. The dtype= argument is just for post-converting the result back to a requested dtype (mostly to retain float32). It's behavior with dtype=int is just a side effect of not explicitly disallowing it.

@rkern
Copy link
Member

rkern commented May 4, 2021

Use arange() for integers, linspace() for floats.

@asmeurer
Copy link
Member Author

asmeurer commented May 4, 2021

I've always saw it as "use arange if you want to specify the step size and linspace if you want to specify the number of points", regardless of the output dtype.

@rkern
Copy link
Member

rkern commented May 4, 2021

Well, what people want and what people can have when numerical airthmetic is concerned are often disjoint sets. :-)

Sometimes, the limitations of floating point arithmetic force us to recast what we want (something that usually makes perfect sense with real-number arithmetic like "increments from 0 to 1, inclusive, by steps of 0.1") into something that we can actually compute reliably (linspace(0, 1, 11)).

Fundamentally, I don't think we should have two different paths for linspace() when (stop - start) just happens to be evenly divisible by num and when it doesn't.

@rgommers
Copy link
Member

rgommers commented May 5, 2021

Fundamentally, I don't think we should have two different paths for linspace() when (stop - start) just happens to be evenly divisible by num and when it doesn't.

That may be correct, but

The dtype= argument is just for post-converting the result back to a requested dtype (mostly to retain float32).

then this is the problem. A dtype= argument is not supposed to work like that. If everything is a float64 calculation anyway, then having no dtype keyword and using np.linspace(...).astype() would be the clearer way to express that.

On the other hand, there does seem to be a separate complex path:

>>> np.linspace(1+1.j, 4, 5, dtype=np.complex64)
array([1.  +1.j  , 1.75+0.75j, 2.5 +0.5j , 3.25+0.25j, 4.  +0.j  ],
      dtype=complex64)

So there's no real reason for there not to be a separate integer path, where if start/stop/step are integers it would use integer division to calculate step size.

There's probably no reasonable way to change it anymore at this point, but either implementing an integer path or disallowing it would have been better. Like, e.g., this:

>>> np.divide(3, 2)
1.5
>>> np.divide(3, 2, dtype=np.int64)
Traceback (most recent call last):
  File "<ipython-input-27-f407b9a53a54>", line 1, in <module>
    np.divide(3, 2, dtype=np.int64)
TypeError: No loop matching the specified signature and casting was found for ufunc true_divide

@eric-wieser
Copy link
Member

I doubt calculating a step size with integer division would be desirable - I find it unlikely someone using np.linspace(0, 15, 11, dype=int) would want [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] as their output - you'd need to compute each value separately without going through a step at all.

@rkern
Copy link
Member

rkern commented May 5, 2021

If everything is a float64 calculation anyway, then having no dtype keyword and using np.linspace(...).astype() would be the clearer way to express that.

Yes, and if I had seen that change come across when it happened, I'd have complained then, too.

@seberg
Copy link
Member

seberg commented May 5, 2021

Hmmm, I guess the argument does make sense for float32 and longdouble (although we probably use float64 for a lot of calculations in float32 as well, and I am not sure thats intentional). But really is probably a mistake for ints :(. I would be fine if we would try to deprecate it again for ints, although thats probably also jumping through a bit of hoops.

@rkern
Copy link
Member

rkern commented May 5, 2021

So there's no real reason for there not to be a separate integer path, where if start/stop/step are integers it would use integer division to calculate step size.

I'm not necessarily arguing that there can't be a separate integer path. But what would be needed to get the requested behaviors is a separate path depending on the precise values of stop-start and num, which I don't think is tenable.

It seems like there are three different expectations about what ought to happen when dtype=int. In the original PR, it was assumed that the user was requesting the .astype(int) behavior. In #16813, the expectation was that we'd use floor(). Here, we've expected a pure-integer computation in these evenly-divisible edge cases (but presumably not that in the non-evenly-divisible cases).

Because of this variety of expectations, I'd actually lean towards deprecating np.integer dtypes to make our way back to refusing the temptation to guess.

@rgommers
Copy link
Member

rgommers commented May 5, 2021

But what would be needed to get the requested behaviors is a separate path depending on the precise values of stop-start and num, which I don't think is tenable.

That I agree with, very much undesirable. My expectation would be to apply casting rules to all inputs; if they're all integer then just use integer division (i.e. linspace(0, 3, num=6) --> array([0, 0, 1, 1, 2, 2]). But yeah, that's also not great.

Because of this variety of expectations, I'd actually lean towards deprecating np.integer dtypes to make our way back to refusing the temptation to guess.

Yes, that seems like the best option indeed.

@rgommers
Copy link
Member

Removing the 1.22.0 milestone, doesn't seem critical to include and hasn't moved in >6 months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants