Skip to content

ISO-2022-JP encoder: convert halfwidth Katakana to fullwidth #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 8, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 21 additions & 11 deletions encoding.bs
Original file line number Diff line number Diff line change
Expand Up @@ -655,10 +655,11 @@ changed, so has the <a>index</a>.
<var>code point</var> is not in <var>index</var>.

<div class=note id=visualization>
<p>There is a non-normative visualization for each index other than <a>index gb18030 ranges</a>.
<a>index jis0208</a> also has an alternative <a>Shift_JIS</a> visualization. Additionally, there is
visualization of the Basic Multilingual Plane coverage of each index other than
<a>index gb18030 ranges</a>.
<p>There is a non-normative visualization for each <a>index</a> other than
<a>index gb18030 ranges</a> and <a>index ISO-2022-JP katakana</a>. <a>index jis0208</a> also has an
alternative <a>Shift_JIS</a> visualization. Additionally, there is visualization of the Basic
Multilingual Plane coverage of each index other than <a>index gb18030 ranges</a> and
<a>index ISO-2022-JP katakana</a>.

<p>The legend for the visualizations is:

Expand Down Expand Up @@ -748,6 +749,12 @@ specification, excluding <a>index single-byte</a>, which have their own table:
No JIX X 0212 ISO-2022-JP support:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=26885
-->
<tr>
<td><dfn export>index ISO-2022-JP katakana</dfn>
<td colspan=3><a href=index-iso-2022-jp-katakana.txt>index-iso-2022-jp-katakana.txt</a>
<td>This maps halfwidth to fullwidth katakana as per Unicode Normalization Form KC, except that
U+FF9E and U+FF9F map to U+309B and U+309C rather than U+3099 and U+309A. It is only used by the
<a>ISO-2022-JP encoder</a>. [[UNICODE]]
</table>

<p>The <dfn>index gb18030 ranges code point</dfn> for <var>pointer</var> is
Expand Down Expand Up @@ -826,10 +833,9 @@ these steps:

<hr>

<p class="note no-backref">All <a lt=index>indexes</a> are also available as
non-normative <a href=indexes.json>indexes.json</a> resource.
(<a>index gb18030 ranges</a> has a slightly different format here, to be able
to represent ranges.)
<p class="note no-backref">All <a lt=index>indexes</a> are also available as a non-normative
<a href=indexes.json>indexes.json</a> resource. (<a>Index gb18030 ranges</a> has a slightly
different format here, to be able to represent ranges.)



Expand Down Expand Up @@ -1901,7 +1907,7 @@ consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.
<li><p>If <a>EUC-JP lead</a> is 0x8E and <var>byte</var> is
in the range 0xA1 to 0xDF, inclusive, set <a>EUC-JP lead</a> to 0x00 and return
a code point whose value is 0xFF61 &minus; 0xA1 + <var>byte</var>.
<!-- katakana; subtraction is done first to avoid upsetting compilers -->
<!-- Katakana; subtraction is done first to avoid upsetting compilers -->

<li><p>If <a>EUC-JP lead</a> is 0x8F and <var>byte</var> is in the range
0xA1 to 0xFE, inclusive, set the <a>EUC-JP jis0212 flag</a>, set
Expand Down Expand Up @@ -2053,7 +2059,7 @@ consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.
<dd><p>Unset the <a>ISO-2022-JP output flag</a> and return <a>error</a>.
</dl>

<dt><dfn lt="ISO-2022-JP decoder Katakana">Katakana</dfn>
<dt><dfn lt="ISO-2022-JP decoder katakana">katakana</dfn>
<dd>
<p>Based on <var>byte</var>:
<dl class=switch>
Expand Down Expand Up @@ -2169,7 +2175,7 @@ consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.
<var>state</var> to <a lt="ISO-2022-JP decoder Roman">Roman</a>.

<li><p>If <var>lead</var> is 0x28 and <var>byte</var> is 0x49<!--I-->, set
<var>state</var> to <a lt="ISO-2022-JP decoder Katakana">Katakana</a>.
<var>state</var> to <a lt="ISO-2022-JP decoder katakana">katakana</a>.

<li><p>If <var>lead</var> is 0x24 and <var>byte</var> is either
0x40<!--@--> or 0x42<!--B-->, set <var>state</var> to
Expand Down Expand Up @@ -2270,6 +2276,10 @@ consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.

<li><p>If <var>code point</var> is U+2212, set it to U+FF0D.

<li><p>If <var>code point</var> is in the range U+FF61 to U+FF9F, inclusive, set it to the
<a>index code point</a> for <var>code point</var> &minus; 0xFF61 in
<a>index ISO-2022-JP katakana</a>.

<li>
<p>Let <var>pointer</var> be the <a>index pointer</a> for <var>code point</var> in
<a>index jis0208</a>.
Expand Down
2 changes: 1 addition & 1 deletion index-big5.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 8dfc771062e7be0810919082c2c06baa2236147909e0ecc235b1cb9ad782ac82
# Date: 2016-10-24
# Date: 2017-05-06

942 0x43F0 䏰 (<CJK Ideograph Extension A>)
943 0x4C32 䰲 (<CJK Ideograph Extension A>)
Expand Down
2 changes: 1 addition & 1 deletion index-euc-kr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 1d97134cbf187263585bc8f593ca4196654ed4c7a673f5672eaad4f5d9fdc4ba
# Date: 2016-10-24
# Date: 2017-05-06

0 0xAC02 갂 (HANGUL SYLLABLE GAGG)
1 0xAC03 갃 (HANGUL SYLLABLE GAGS)
Expand Down
2 changes: 1 addition & 1 deletion index-gb18030-ranges.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: f963aaa1653f630c523e7b04729fb4e4458f35806c45eb5c179445623138f0c0
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080
36 0x00A5
Expand Down
2 changes: 1 addition & 1 deletion index-gb18030.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 715f084846f5c6fc9dd31046d0a4d604bd2d88bfe3a22833cea048415e413c70
# Date: 2016-10-24
# Date: 2017-05-06

0 0x4E02 丂 (<CJK Ideograph>)
1 0x4E04 丄 (<CJK Ideograph>)
Expand Down
2 changes: 1 addition & 1 deletion index-ibm866.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: db6fe14a559d1601a7667338d83704773d5708dbc641e1ad3c5e21405770f05e
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0410 А (CYRILLIC CAPITAL LETTER A)
1 0x0411 Б (CYRILLIC CAPITAL LETTER BE)
Expand Down
72 changes: 72 additions & 0 deletions index-iso-2022-jp-katakana.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Any copyright is dedicated to the Public Domain.
# https://creativecommons.org/publicdomain/zero/1.0/
#
# For details on index index-iso-2022-jp-katakana.txt see the Encoding Standard
# https://encoding.spec.whatwg.org/
#
# Identifier: 6ffc12c11f6eab1ccb3dada740d9b0db096ef0b0783c3bd5ec951dcb4a44b95e
# Date: 2017-05-06

0 0x3002 。 (IDEOGRAPHIC FULL STOP)
1 0x300C 「 (LEFT CORNER BRACKET)
2 0x300D 」 (RIGHT CORNER BRACKET)
3 0x3001 、 (IDEOGRAPHIC COMMA)
4 0x30FB ・ (KATAKANA MIDDLE DOT)
5 0x30F2 ヲ (KATAKANA LETTER WO)
6 0x30A1 ァ (KATAKANA LETTER SMALL A)
7 0x30A3 ィ (KATAKANA LETTER SMALL I)
8 0x30A5 ゥ (KATAKANA LETTER SMALL U)
9 0x30A7 ェ (KATAKANA LETTER SMALL E)
10 0x30A9 ォ (KATAKANA LETTER SMALL O)
11 0x30E3 ャ (KATAKANA LETTER SMALL YA)
12 0x30E5 ュ (KATAKANA LETTER SMALL YU)
13 0x30E7 ョ (KATAKANA LETTER SMALL YO)
14 0x30C3 ッ (KATAKANA LETTER SMALL TU)
15 0x30FC ー (KATAKANA-HIRAGANA PROLONGED SOUND MARK)
16 0x30A2 ア (KATAKANA LETTER A)
17 0x30A4 イ (KATAKANA LETTER I)
18 0x30A6 ウ (KATAKANA LETTER U)
19 0x30A8 エ (KATAKANA LETTER E)
20 0x30AA オ (KATAKANA LETTER O)
21 0x30AB カ (KATAKANA LETTER KA)
22 0x30AD キ (KATAKANA LETTER KI)
23 0x30AF ク (KATAKANA LETTER KU)
24 0x30B1 ケ (KATAKANA LETTER KE)
25 0x30B3 コ (KATAKANA LETTER KO)
26 0x30B5 サ (KATAKANA LETTER SA)
27 0x30B7 シ (KATAKANA LETTER SI)
28 0x30B9 ス (KATAKANA LETTER SU)
29 0x30BB セ (KATAKANA LETTER SE)
30 0x30BD ソ (KATAKANA LETTER SO)
31 0x30BF タ (KATAKANA LETTER TA)
32 0x30C1 チ (KATAKANA LETTER TI)
33 0x30C4 ツ (KATAKANA LETTER TU)
34 0x30C6 テ (KATAKANA LETTER TE)
35 0x30C8 ト (KATAKANA LETTER TO)
36 0x30CA ナ (KATAKANA LETTER NA)
37 0x30CB ニ (KATAKANA LETTER NI)
38 0x30CC ヌ (KATAKANA LETTER NU)
39 0x30CD ネ (KATAKANA LETTER NE)
40 0x30CE ノ (KATAKANA LETTER NO)
41 0x30CF ハ (KATAKANA LETTER HA)
42 0x30D2 ヒ (KATAKANA LETTER HI)
43 0x30D5 フ (KATAKANA LETTER HU)
44 0x30D8 ヘ (KATAKANA LETTER HE)
45 0x30DB ホ (KATAKANA LETTER HO)
46 0x30DE マ (KATAKANA LETTER MA)
47 0x30DF ミ (KATAKANA LETTER MI)
48 0x30E0 ム (KATAKANA LETTER MU)
49 0x30E1 メ (KATAKANA LETTER ME)
50 0x30E2 モ (KATAKANA LETTER MO)
51 0x30E4 ヤ (KATAKANA LETTER YA)
52 0x30E6 ユ (KATAKANA LETTER YU)
53 0x30E8 ヨ (KATAKANA LETTER YO)
54 0x30E9 ラ (KATAKANA LETTER RA)
55 0x30EA リ (KATAKANA LETTER RI)
56 0x30EB ル (KATAKANA LETTER RU)
57 0x30EC レ (KATAKANA LETTER RE)
58 0x30ED ロ (KATAKANA LETTER RO)
59 0x30EF ワ (KATAKANA LETTER WA)
60 0x30F3 ン (KATAKANA LETTER N)
61 0x309B ゛ (KATAKANA-HIRAGANA VOICED SOUND MARK)
62 0x309C ゜ (KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK)
2 changes: 1 addition & 1 deletion index-iso-8859-10.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 02c2b5590d8ccda9931008c471f6ee2c590b2c8fe5e6ccb3b08638115d778507
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-13.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 40736338e964ab520407cebcb01329f8d450abf6ce12bf88b74b655b60e43300
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-14.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 2c8651cfc08b1f35b17919ee5379f2fa006af3ec809f11b3b7f470785580542b
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-15.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: a560aba47bccd7510a6ac77f671fe75dca3800f05cf6d676910c311a8f8ff079
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-16.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 55676320d2d1b6e6909f5b3d741a7cf0cefc84e920aa4474afc091459111c2e3
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 9569c67f22d0b57790e1c407c6eecf227e4562322dc296de43cdab7a0152ec73
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-3.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: af8f1e12df79b768322b5e83613698cdc619438270a2fc359554331c805054a3
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-4.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 72f29c92344d351fe9e74a946e7e0468d76d542c6894ff82982cb652ebe0feb7
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-5.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: fa9b1f3f5242df43e2e7bca80e9b6997c67944f20a4af91ee06bacc4e132d9c9
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-6.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 85bb7b5c2dc75975afebe5743935ba4ed5a09c1e9e34e9bfb2ff80293f5d8bbc
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-7.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: f53d8aeba36314ef950eef02ffcf11dff540638ce27dfe7a86b6ccc6875afb24
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-8.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 7657a9ca3fa875990da960d3f812eea28dcd0ae6ed55a18d5394303c86f5484b
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-jis0208.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: cbaa91f3deb7d0841faf5c33041fc15a285da0e87e64ab802c4bf04b7c4da861
# Date: 2016-10-24
# Date: 2017-05-06

0 0x3000   (IDEOGRAPHIC SPACE)
1 0x3001 、 (IDEOGRAPHIC COMMA)
Expand Down
2 changes: 1 addition & 1 deletion index-jis0212.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 83bf90dd1c591a4355730d8c4567efc499d74da7490531019ef22a879991cfb7
# Date: 2016-10-24
# Date: 2017-05-06

108 0x02D8 ˘ (BREVE)
109 0x02C7 ˇ (CARON)
Expand Down
2 changes: 1 addition & 1 deletion index-koi8-r.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: c5497cd9071cb352c0e56b219154e539badf63de40b71578f09e2e11fe7d50ae
# Date: 2016-10-24
# Date: 2017-05-06

0 0x2500 ─ (BOX DRAWINGS LIGHT HORIZONTAL)
1 0x2502 │ (BOX DRAWINGS LIGHT VERTICAL)
Expand Down
2 changes: 1 addition & 1 deletion index-koi8-u.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 19a4da2c3f245118bbc8019326f45a07832949938ff903f03d62ac4da1f61f40
# Date: 2016-10-24
# Date: 2017-05-06

0 0x2500 ─ (BOX DRAWINGS LIGHT HORIZONTAL)
1 0x2502 │ (BOX DRAWINGS LIGHT VERTICAL)
Expand Down
2 changes: 1 addition & 1 deletion index-macintosh.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: f2c6a4f6406b3e86a50a5dba4d2b7dd48e2e33c0d82aefe764535c934ec11764
# Date: 2016-10-24
# Date: 2017-05-06

0 0x00C4 Ä (LATIN CAPITAL LETTER A WITH DIAERESIS)
1 0x00C5 Å (LATIN CAPITAL LETTER A WITH RING ABOVE)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1250.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 0669455a7a1c70ba6003ea737991e8ee9adc455125c13cfe6705a361358de5fa
# Date: 2016-10-24
# Date: 2017-05-06

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1251.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 7592ef921679ba168b00a9e9afa3b4eebd67bf13dc7e84c4b6e120de856826e0
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0402 Ђ (CYRILLIC CAPITAL LETTER DJE)
1 0x0403 Ѓ (CYRILLIC CAPITAL LETTER GJE)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1252.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: e56d49d9176e9a412283cf29ac9bd613f5620462f2a080a84eceaf974cfa18b7
# Date: 2016-10-24
# Date: 2017-05-06

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1253.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 49fdc881a3488904dd1e8dfba9aef3258454249958b611bcded1d4c981ab5561
# Date: 2016-10-24
# Date: 2017-05-06

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1254.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: e80a27adf377438be8ba5bd223875ea56d6a4d47f958cce1c957a2c446825caa
# Date: 2016-10-24
# Date: 2017-05-06

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
Loading