-
Notifications
You must be signed in to change notification settings - Fork 83
EUK-kr encoding/decoding support #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Chrome's failure in form submission (note that Chromium passes href test 100%) with 28 (mostly Cf characters : https://goo.gl/HKf47P ) is likely to be caused by Blink's handling of those characters even before they reach the EUC-KR encoder. The encoder does not see them at all, which is why there's empty output. |
As for Edge's behavior, Edge must be interpreting EUC-KR label strictly (that is, interpreting it as NOT being able to encode 8,822 [1] Hangul syllables that are NOT a part of the original KS X 1001 when it was KS C 5601). Edge is lenient in the decoding direction, though. @ri2a, have you tried using the label 'ks_c_5601-1987' instead? It'll be interesting to see how Edge treats that label. MS IE used that label to refer to Windows-949 (they should not !) even though KS C 5601-1987 does not have any provision to encode 8,822 Hangul syllables in the way Windows-949 encodes. Firefox used to have even more strict interpretation. KS X 1001 (formerly KS C 5601) has a provision to encode 8,821 Hangul syllables with 8-byte sequences and Firefox used to encode them that way with EUC-KR. It does not do that anymore, I guess. [1] 8,822 = 11,172 (# of all possible Hangul syllables in modern orthography) - 2,350 (encoded in KS X 1001). |
the alias label tests are here: At https://www.w3.org/International/tests/repo/results/encoding-dbl-byte-labels.en#euckr i tried out the ks_c_5601-1987 test, and it passed for all 17,048 characters checked, so your hypothesis may well be correct. |
@r12a, thanks for testing. Sigh... |
Today and yesterday i updated the results at https://www.w3.org/International/tests/repo/results/encoding-dbl-byte.en#euckr for Firefox, FNightly, Chrome, and Canary. The latest summary is: |
Thank you. The EUC-KR tests LGTM for merging into WPT. /cc @domenic |
Let's close this as web-platform-tests/wpt#6258 is ready to merge. |
Reopening per #61 (comment) |
Now that Firefox passes all these tests and a year has passed, I'm happy to consider this done. A new issue would also be less noisy at this point, were one warranted. |
Results for a series of tests for EUK-kr encoding/decoding can be found at
https://www.w3.org/International/tests/repo/results/encoding-dbl-byte.en#euckr
The tests can be run from that page (select the link in the left-most column) or get the tests from the WPT repo. There is a PR at
web-platform-tests/wpt#3201
The tests check whether:
The following summarises the current situation according to my testing, for major desktop browsers. (I will be adding nightly results and perhaps other browsers in time.) The table lists the number of characters that were NOT successfully converted by the test.
Notes:
Can we please investigate the failures to ascertain whether:
The following tool may be helpful for investigating issues. It converts between byte sequences and characters for all encodings in the Encoding spec. http://r12a.github.io/apps/encodings/
The text was updated successfully, but these errors were encountered: