Skip to content

Incorrect minRunLength() computing minRun to be too small #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CovetingEpiphany2152 opened this issue Oct 10, 2020 · 4 comments
Closed

Comments

@CovetingEpiphany2152
Copy link

CovetingEpiphany2152 commented Oct 10, 2020

The following line seems incorrect.

Since MIN_MERGE is 32, the loop would end with n containing the 5 most significant bits instead of 6 most significant bits.
Therefore, the function would return [0, 32], which is less than what Python is doing here

The Python detailed description of Timsort in the readme here also corresponds to returning a size of [32, 64] when possible.

@Morwenn
Copy link
Member

Morwenn commented Oct 10, 2020

Indeed, CPython seems to have MIN_MERGE = 64 and it's not a late addition or fix since it was already the case when this project was started in 2011. If there is a reason for the difference I don't know what it is: the value in this C++ project has been 32 since the initial commit so there isn't any specific commit explaining it. Only @gfx might know why.

I'm going to change it and run all the tests I have to check whether everything still works. If there isn't any obvious issue I will change it to 64. Thanks for the catch.

@gfx
Copy link
Member

gfx commented Oct 11, 2020

Oh, I don't remember the why. Please fix it.

@CovetingEpiphany2152
Copy link
Author

CovetingEpiphany2152 commented Oct 11, 2020

Indeed, CPython seems to have MIN_MERGE = 64 ...

MIN_MERGE seems to mean "the minimum size at which I will start doing merging". Also, it is also used here.

So, I wouldn't change MIN_MERGE to 64. On the contrary, I would change this line to while (n >= 2*MIN_MERGE) {.
This would ensure that the name MIN_MERGE is accurate, and we do merging at sizes of range [32, 64] when there are enough elements left to "top-up" our runs to said sizes. What do you think?

Btw, thanks for maintaining this repo (and many others).

@Morwenn
Copy link
Member

Morwenn commented Oct 11, 2020

You're right once again, I applied your proposed resolution, thanks :)

Thank you Goro too for the heads up!

Morwenn added a commit that referenced this issue Oct 11, 2020
For some reason the loop condition in minRunLength was MIN_MERGE=32
while the original CPython implementation uses 64 there. This commit
changes that condition to 2*MIN_MERGE to bring the condition back to 64
at it should probably have been.

Thanks @weeyizhi for identifying and the fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants