From a0c98be7709f4d7d409ddb23663fbf62b696af97 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Thu, 11 May 2017 13:25:39 +0200 Subject: [PATCH 1/2] gb18030 decoder: unwind from fourth byte when it's not a digit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Instead of always unwinding if there’s no code point when consuming the fourth byte, only unwind when the fourth byte is not an ASCII digit. This does mean that ASCII digits can be masked, but since ASCII digits are not used as delimiter in any format this is highly unlikely to be used in any attacks (and also matches existing implementations better). Fixes #110. --- encoding.bs | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/encoding.bs b/encoding.bs index 018d917..809b3ab 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1641,11 +1641,13 @@ consumers of content generated with GBK's encoder.
  • Set gb18030 first, gb18030 second, and gb18030 third to 0x00. -

  • If code point is null, - prepend buffer to - stream and return error. +

  • If code point is non-null, return a code point whose value is + code point. -

  • Return a code point whose value is code point. +

  • If byte is not in the range 0x30 to 0x39, inclusive, prepend + buffer to stream. + +

  • Return error.

  • From 951c4714dee1eb06bc5b0b94a5d48bed2ae7eaba Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Thu, 11 May 2017 13:48:30 +0200 Subject: [PATCH 2/2] rewrite --- encoding.bs | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/encoding.bs b/encoding.bs index 809b3ab..80f5f04 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1625,29 +1625,26 @@ consumers of content generated with GBK's encoder.

    If gb18030 third is not 0x00, then:

      -
    1. Let code point be null. +

    2. +

      If byte is not in the range 0x30 to 0x39, inclusive, then: -

    3. If byte is in the range 0x30 to 0x39, inclusive, set - code point to the - index gb18030 ranges code point for - ((gb18030 first − 0x81) × (10 × 126 × 10)) + - ((gb18030 second − 0x30) × (10 × 126)) + - ((gb18030 third − 0x81) × 10) + byte − 0x30. +

        +
      1. Prepend gb18030 second, gb18030 third, and byte to + stream. -

      2. Let buffer be a byte sequence consisting of - gb18030 second, gb18030 third, and byte, in - order. +

      3. Set gb18030 first, gb18030 second, and gb18030 third to 0x00. -

      4. Set gb18030 first, gb18030 second, and - gb18030 third to 0x00. +

      5. Return error. +

      -
    4. If code point is non-null, return a code point whose value is - code point. +

    5. Let code point be the index gb18030 ranges code point for + ((gb18030 first − 0x81) × (10 × 126 × 10)) + + ((gb18030 second − 0x30) × (10 × 126)) + + ((gb18030 third − 0x81) × 10) + byte − 0x30. -

    6. If byte is not in the range 0x30 to 0x39, inclusive, prepend - buffer to stream. +

    7. If code point is null, return error. -

    8. Return error. +

    9. Return a code point whose value is code point.