Skip to content

gb18030 decoder doesn't clear first, second, third after returning 4-byte encoded code point #146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chfoo opened this issue Jun 14, 2018 · 4 comments
Labels
good first issue Ideal for someone new to a WHATWG standard or software project normative

Comments

@chfoo
Copy link

chfoo commented Jun 14, 2018

https://encoding.spec.whatwg.org/commit-snapshots/b04091a5f079a7bdcab5aa8c7adead554326a96c/#gb18030-decoder

If gb18030 third is not 0x00, then:​

If byte is not in the range 0x30 to 0x39, inclusive, then:​

Prepend gb18030 second, gb18030 third, and byte to stream.

Set gb18030 first, gb18030 second, and gb18030 third to 0x00.

Return error.

Let code point be the index gb18030 ranges code point for ((gb18030 first − 0x81) × (10 × 126 × 10)) + ((gb18030 second − 0x30) × (10 × 126)) + ((gb18030 third − 0x81) × 10) + byte − 0x30.

If code point is null, return error.

Return a code point whose value is code point.

I'm having trouble understanding how, after the last step above, the decoder will accept the next byte correctly. Because gb18030 first/gb18030 second/gb18030 third is not 0x00 after this last step, it seems to enter the wrong steps for subsequent bytes.

For example, if I have the byte sequence in hex 20 81 40 84 31 83 30, decoding it will result in 丂︔� (error at the end) but the expected is 丂︔.

I think "set gb18030 first, gb18030 second, and gb18030 third to 0x00" before returning error or code point is missing?

@annevk
Copy link
Member

annevk commented Jun 14, 2018

Thank you! This was introduced in #111. @hsivonen should I go back to my original proposed wording or duplicate the setting to 0x00 of these state variables?

(An alternative that avoids duplication would be some kind of "return and unset" routine, but that doesn't really fit in the current setup.)

@annevk
Copy link
Member

annevk commented Jun 14, 2018

@chfoo forgot to ask, ok to acknowledge you as Christopher Foo? Or would you prefer a different name? (Also, if you'd like, you'd be welcome to submit a PR to fix this after @hsivonen confirms an approach.)

@chfoo
Copy link
Author

chfoo commented Jun 14, 2018

Yes, that acknowledgement is fine.

@annevk annevk added normative good first issue Ideal for someone new to a WHATWG standard or software project labels Jun 15, 2018
annevk added a commit that referenced this issue Aug 30, 2018
Due to an oversight in #111 the gb18030 decoder didn't always reset state before returning after a four-byte sequence.

Fixes #146.
@annevk
Copy link
Member

annevk commented Aug 30, 2018

@chfoo sorry for the delay, #153 has a proposed fix.

annevk added a commit that referenced this issue Aug 30, 2018
Due to an oversight in #111 the gb18030 decoder didn't always reset state before returning after a four-byte sequence.

Fixes #146.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Ideal for someone new to a WHATWG standard or software project normative
Development

No branches or pull requests

2 participants