ISO 2022-jp encoding/decoding support #60

r12a · 2016-06-20T16:09:18Z

Results for a series of tests for EUC-jp encoding/decoding can be found at
https://www.w3.org/International/tests/repo/results/encoding-dbl-byte.en#iso2022jp

The tests can be run from that page (select the link in the left-most column) or get the tests from the WPT repo. There is a PR at
web-platform-tests/wpt#3199

The tests check whether:

the browser produces the expected byte sequences for all characters in the iso-2022-jp encoding when encoding bytes for a URL produced by a form, using the encoder steps in the specification.
the browser produces percent-escaped character references for a URL produced by a form when encoding miscellaneous characters that are not in the iso-2022-jp encoding. (tests for several ranges)
same two types of test when writing characters to an href value
the browser decodes all characters as expected from a file generated by encoding all pointers in the iso-2022-jp encoding per the encoder steps in the specification.
when decoding iso-2022-jp text, the browser uses replacement characters as described by the algorithm in the Encoding spec.

The following summarises the current situation according to my testing, for major desktop browsers. (I will be adding nightly results and perhaps other browsers in time.) The table lists the number of characters that were NOT successfully converted by the test.

Notes:

Edge fails all href encode tests because characters are not converted to percent-escapes in the href attribute.
Firefox fails all href encode tests for characters not in the encoding because it converts characters to percent-escaped Unicode values instead.

Can we please investigate the failures to ascertain whether:

the browser needs to be changed
the spec needs to be changed
the test is at fault

The following tool may be helpful for investigating issues. It converts between byte sequences and characters for all encodings in the Encoding spec. http://r12a.github.io/apps/encodings/

r12a · 2016-06-23T17:04:03Z

The figure for iso2022jp-encode-href-errors-misc ought to be 79.

r12a · 2016-09-15T17:10:30Z

List of bugs raised:

jungshik · 2016-09-16T06:54:10Z

Chromium:

form/href-encoding-misc:
Out of 93, 30 characters(Cf, default ignorable) share the same cause as Big5 encoding/decoding support #58, EUC-jp encoding/decoding support #59, Shift-JIS encoding/decoding support #61, EUK-kr encoding/decoding support #62.
The rest seems to be half-width Katakana. I remember raising an issue with this somewhere (and taking action), but I couldn't find it.
form/href-encoding: 373 characters. Mostly CJK Ideographs. Chromium treats them as not covered by ISO-2022-JP. Need to investigate. ISO-2022-JP in ICU (used by Chrome) share the table with Shift_JIS (and Shift_JIS in chromium passes the tests).

r12a · 2016-09-16T11:28:46Z

@jungshik is this the bug report about half-width katakana that you were looking for?

https://bugs.chromium.org/p/chromium/issues/detail?id=544402&thanks=544402&ts=1445064020

jungshik · 2016-09-22T17:51:41Z

@r12a, No, that's not what I had in mind. I remember doing something - at least filing a bug - to take a look at what you reported here; Half-width Katana in ISO-2022-JP, but I couldn't find it.

If you have the same issue in UTF-8 at http://r12a.github.io/uniview/?block=halfwidth_and_fullwidth_forms (which I couldn't reproduce on my Mac Chrome in the past. I didn't try it today) , it cannot be related to the encoding conversion.

hsivonen · 2017-04-27T15:36:01Z

Two decode expectations for malformed sequences seem wrong:
Fail escape start: 1B 65 79 56 1B 28 42 assert_equals: expected "�eByV" but got "�eyV"
Fail escape: 1B 24 65 79 56 1B 28 42 assert_equals: expected "�e$yV" but got "�$eyV"

vyv03354 · 2017-06-14T13:39:23Z

Firefox Nightly 56 got much improvement, but still encoding errors has 63 failures and decoding errors has 2 failures.

r12a · 2017-06-15T07:36:26Z

Today and yesterday i updated the results at https://www.w3.org/International/tests/repo/results/encoding-dbl-byte.en#iso2022jp for Firefox, FNightly, Chrome, and Canary. The latest summary is:

hsivonen · 2017-06-15T09:16:44Z

Firefox Nightly 56 got much improvement, but still encoding errors has 63 failures and decoding errors has 2 failures.

Per earlier comment, the decoder error handling failures are test suite bugs.

The encoder failures are due to the test suite not having been updated to account for the spec change to half-width katakana handling.

r12a · 2017-08-09T16:55:17Z

The encoder failures are due to the test suite not having been updated to account for the spec change to half-width katakana handling.

@hsivonen i updated the encoder algorithm used by the tests. I haven't updated the results page yet, but you can run the tests from https://www.w3.org/International/tests/repo/results/encoding-dbl-byte.en#iso2022jp (click on the title in the left column).

The result of that fix is that Chrome and Safari now pass iso2022jp-encode-form-errors-misc.html cleanly. Firefox however still sticks on 8 characters (not katakana), so i'm guessing that may be a FF bug(?)

r12a · 2017-08-09T17:09:03Z

Two decode expectations for malformed sequences seem wrong:
Fail escape start: 1B 65 79 56 1B 28 42 assert_equals: expected "�eByV" but got "�eyV"
Fail escape: 1B 24 65 79 56 1B 28 42 assert_equals: expected "�e$yV" but got "�$eyV"

@hsivonen @annevk i fixed the first, which was indeed a bug. (Wrong expectations.)
however the second test you mention above still fails in FF, Chrome and Safari.

I suspect it may be a bug in the decoder algorithm at https://encoding.spec.whatwg.org/#iso-2022-jp-decoder

step escape.8 says:

Prepend lead and byte to stream.

To get what the browsers are actually returning, i think it needs to say
"Prepend byte and lead to stream"

Or perhaps better, specify explicitly the order in which those two should end up when prepended.

r12a · 2017-08-09T17:14:17Z

I should mention that i tested those in FF 55.

annevk · 2017-08-09T17:36:27Z

What it means is that they're to be prepended together in the order specified as specified at https://encoding.spec.whatwg.org/#concept-stream-prepend. I'm not sure how you can read it any other way.

annevk · 2017-08-09T17:38:03Z

It would seem weird to prepend lead and then prepend byte (aka prepend byte and lead) as that would have them be in the reverse order when you read that stream.

hsivonen · 2017-08-09T17:53:46Z

encoding_rs went into Firefox 56, so testing 56 is more useful than testing 55.

I see one failure (escape: 1B 24 65 79 56 1B 28 42 | assert_equals: expected "�e$yV" but got "�$eyV") in Firefox Nightly. This is clearly a test case bug with the test case having e and $ reversed compared to the ASCII interpretation of the input bytes.

r12a · 2017-08-09T19:39:27Z

What it means is that they're to be prepended together in the order specified as specified at https://encoding.spec.whatwg.org/#concept-stream-prepend. I'm not sure how you can read it any other way.

I read

those tokens must be inserted, in given order

as "insert the first one, and then insert the second one". Depends whether 'in given order' refers to 'insert' or 'the tokens'. I read it as "prepend lead, then byte to stream". (I'll admit that i was following the wording rather than the deep logic of what was going on.)

Something like you just said may be clearer, eg. Prepend lead and byte together to stream.

Anyway, i'll fix it.

annevk · 2018-10-17T07:53:45Z

Now that Firefox passes all these tests and a year has passed, I'm happy to consider this done. A new issue would also be less noisy at this point, were one warranted.

If you want to pursue changing the wording here I'd be open to that by the way, but let's discuss that in a new issue.

jungshik · 2018-11-06T11:00:47Z

2. form/href-encoding: 373 characters. Mostly CJK Ideographs. Chromium treats them as not covered by ISO-2022-JP. Need t

It's now fixed in Chromium's ToT. It'll be included in next canary and Chrome 72 (will turn stable in January. dev/beta before that). https://crbug.com/901255 .

annevk added the tests label Nov 16, 2016

r12a mentioned this issue Aug 15, 2017

ISO 2022-jp encoding and decoding tests web-platform-tests/wpt#3199

Closed

annevk closed this as completed Oct 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISO 2022-jp encoding/decoding support #60

ISO 2022-jp encoding/decoding support #60

r12a commented Jun 20, 2016 •

edited

Loading

r12a commented Jun 23, 2016

r12a commented Sep 15, 2016

jungshik commented Sep 16, 2016

r12a commented Sep 16, 2016

jungshik commented Sep 22, 2016

hsivonen commented Apr 27, 2017

vyv03354 commented Jun 14, 2017

r12a commented Jun 15, 2017

hsivonen commented Jun 15, 2017

r12a commented Aug 9, 2017

r12a commented Aug 9, 2017

r12a commented Aug 9, 2017

annevk commented Aug 9, 2017

annevk commented Aug 9, 2017

hsivonen commented Aug 9, 2017

r12a commented Aug 9, 2017 •

edited

Loading

annevk commented Oct 17, 2018

jungshik commented Nov 6, 2018

ISO 2022-jp encoding/decoding support #60

ISO 2022-jp encoding/decoding support #60

Comments

r12a commented Jun 20, 2016 • edited Loading

r12a commented Jun 23, 2016

r12a commented Sep 15, 2016

jungshik commented Sep 16, 2016

r12a commented Sep 16, 2016

jungshik commented Sep 22, 2016

hsivonen commented Apr 27, 2017

vyv03354 commented Jun 14, 2017

r12a commented Jun 15, 2017

hsivonen commented Jun 15, 2017

r12a commented Aug 9, 2017

r12a commented Aug 9, 2017

r12a commented Aug 9, 2017

annevk commented Aug 9, 2017

annevk commented Aug 9, 2017

hsivonen commented Aug 9, 2017

r12a commented Aug 9, 2017 • edited Loading

annevk commented Oct 17, 2018

jungshik commented Nov 6, 2018

r12a commented Jun 20, 2016 •

edited

Loading

r12a commented Aug 9, 2017 •

edited

Loading