Skip to content

BUG: Fixed tm.set_locale context manager, it could error and leak when category LC_ALL was used #47570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 8, 2022

Conversation

smarie
Copy link
Contributor

@smarie smarie commented Jul 1, 2022

@smarie
Copy link
Contributor Author

smarie commented Jul 1, 2022

It would be great to add a test reproducing the initial problem, before fixing it.
However it is a bit tricky because we need to execute this test only when we know the locale is available. And of course we can't use can_set_locale for this :)
#46595 (comment)

@smarie
Copy link
Contributor Author

smarie commented Jul 6, 2022

@jbrockmendel @mroeschke : Please review this very small PR to fix a leakage problem in set_locale (#46595).

When the problem was first fixed, then a few tests started to fail on workers it_IT and zh_CN (see https://github.com/pandas-dev/pandas/actions/runs/2597381464). Looking more closely at the problem, I saw that this was tests marked as xfail if the locale is... it_IT or zh_CN :)

https://github.com/pandas-dev/pandas/runs/7152086548?check_suite_focus=true#step:7:54

=================================== FAILURES ===================================
_ TestTimeConversionFormats.test_to_datetime_format_time[True-01/10/2010 08:14 PM-%m/%d/%Y %I:%M %p-dt3] _
[gw0] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8
_ TestTimeConversionFormats.test_to_datetime_format_time[True-01/10/2010 07:40 AM-%m/%d/%Y %I:%M %p-dt4] _
[gw0] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8
_ TestTimeConversionFormats.test_to_datetime_format_time[True-01/10/2010 09:12:56 AM-%m/%d/%Y %I:%M:%S %p-dt5] _
[gw0] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8
_ TestTimeConversionFormats.test_to_datetime_format_time[False-01/10/2010 08:14 PM-%m/%d/%Y %I:%M %p-dt3] _
[gw0] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8
_ TestTimeConversionFormats.test_to_datetime_format_time[False-01/10/2010 07:40 AM-%m/%d/%Y %I:%M %p-dt4] _
[gw0] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8
_ TestTimeConversionFormats.test_to_datetime_format_time[False-01/10/2010 09:12:56 AM-%m/%d/%Y %I:%M:%S %p-dt5] _
[gw0] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8
_____________________ TestToTime.test_parsers_time[2:15pm] _____________________
[gw1] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8
_____________________ TestToTime.test_parsers_time[0215pm] _____________________
[gw1] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8
___________________ TestToTime.test_parsers_time[2:15:00pm] ____________________
[gw1] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8
____________________ TestToTime.test_parsers_time[021500pm] ____________________
[gw1] linux -- Python 3.8.13 /home/runner/micromamba/envs/test/bin/python
[XPASS(strict)] fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8

This was actually a good thing: it proves that the fix worked !

I simply added the strict=False flag to the corresponding xfail marks, because currently pytest is run with strict xfail mode.
I also added a few tests, as well as a convenience function get_current_locale to ease long-term maintenance if this kind of very hard-to-debug issue.

When this will be merged, we will be able to proceed with PR #46405 so stay tuned :)

@smarie smarie changed the title Fixed tm.set_locale context manager, it could error and leak when category LC_ALL was used BUG: Fixed tm.set_locale context manager, it could error and leak when category LC_ALL was used Jul 6, 2022
@@ -291,6 +291,7 @@ def test_to_datetime_format_microsecond(self, cache):
marks=pytest.mark.xfail(
locale.getlocale()[0] in ("zh_CN", "it_IT"),
reason="fail on a CI build with LC_ALL=zh_CN.utf8/it_IT.utf8",
strict=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just remove these xfails entirely?

Copy link
Contributor Author

@smarie smarie Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these tests are executed on other workers, and they pass. So they are important :) We should actually try to solve the issue (datetime parsing, separate topic) if we wish to remove the xfail mark

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be make the xfail condition more specific? strict=False is really undesirable

Copy link
Contributor Author

@smarie smarie Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already quite specific: the test will be marked as xfail only if locale.getlocale()[0] in ("zh_CN", "it_IT"). So for all CI workers except the 2 workers zh_CN and it_IT, the tests are actually correctly executed (and they pass)

Note that I did not write this code, it was already there. But since there was a locale leakage bug, the zh_CN and it_IT locale was actually never active (even on the 2 corresponding CI workers) when hitting the code. This is why noone every needed to add the strict=False flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why strict=False is needed i.e. why does it sometimes pass? Is it because the locale setting isn't threadsafe? If so, these tests can be decorated with @pytest.mark.single_cpu

Copy link
Contributor Author

@smarie smarie Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less specific IMO. See below : on test_to_time you use a fails_on_non_english mark (for readability) but with still a test about zh_CN and it_IT (because for other locales we actually do not know if it would pass or fail, therefore we do not want to use the xfail mark too prematurely). Let me know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with strict=False for now then especially if there's a path to know how we can remove these in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with strict=False for now then especially if there's a path to know how we can remove these in the future.

im not clear on what this path is. can you elaborate on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solve the issue (datetime parsing, separate topic)

From your investigation @smarie fixing the datetime parsing would allow us to remove the xfail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixing the parsing for these two locales would remove the xfail.

@mroeschke mroeschke added Testing pandas testing functions or related to the test suite Localization Internationalization of data labels Jul 8, 2022
@mroeschke mroeschke added this to the 1.5 milestone Jul 8, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM given the tests passing

@jreback jreback merged commit 8e6ca28 into pandas-dev:main Jul 8, 2022
@jreback
Copy link
Contributor

jreback commented Jul 8, 2022

thanks @smarie

@smarie smarie deleted the feature/46405_set_locale_bug branch July 11, 2022 07:15
@smarie
Copy link
Contributor Author

smarie commented Jul 11, 2022

Thanks to both of you for the fast review !

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Localization Internationalization of data Testing pandas testing functions or related to the test suite
Projects
None yet
4 participants