Date: Sun, 15 Apr 2018 10:08:15 -0400
Preserving from the old std-text-wg_at_[hidden] mailing list.
-------- Forwarded Message --------
Subject: Shift-JIS NEC/IBM discussion
Date: Mon, 26 Feb 2018 18:18:25 +0000
From: Mark Zeren <mzeren_at_[hidden]>
To: std-text-wg <std-text-wg_at_[hidden]>
copied from Slack for safe keeping:
sdowney [1 hour ago]
Shift-JIS has a few hundred distinct character pairs that were unified
into the same unicode codepoints?
rmf [24 minutes ago]
That's only half correct. There are several problematic characters in
the Japanese encoding standards, but this isn't an issue with Han
unification.
rmf [22 minutes ago]
Those pairs are pairs *of the same character*, which happens to exist
*twice* in common Shift-JIS codepages, like Microsoft's cp932.
rmf [20 minutes ago]
The reason those code pages encode the same character twice is because
of the way Shift-JIS extensions occurred. Almost all of the problematic
characters were added by NEC and by IBM at separate Shift-JIS code
points (edited)
rmf [19 minutes ago]
Because these pairs don't overlap, Microsoft's code page doubles as an
IBM-compatible Shift-JIS and as a NEC-compatible Shift-JIS by mapping both.
rmf [11 minutes ago]
So yes, some Shift-JIS codepages, like cp932, don't roundtrip with naive
processes, but:
rmf [10 minutes ago]
1. the problem is specific to the code pages and unrelated to Han
unification
rmf [10 minutes ago]
2. the problem is actually irrelevant unless you're interacting with
e.g. NEC-only or IBM-only systems
rmf [10 minutes ago]
And 3. Unicode has mechanisms to actually roundtrip this properly if you
need it.
rmf [7 minutes ago]
(If you want an example: NEC encoded 纊at Shift-JIS position 0xED40; IBM
encoded it at Shift-JIS position 0xFA5C; Unicode has it at U+7E8A)
-------- Forwarded Message --------
Subject: Shift-JIS NEC/IBM discussion
Date: Mon, 26 Feb 2018 18:18:25 +0000
From: Mark Zeren <mzeren_at_[hidden]>
To: std-text-wg <std-text-wg_at_[hidden]>
copied from Slack for safe keeping:
sdowney [1 hour ago]
Shift-JIS has a few hundred distinct character pairs that were unified
into the same unicode codepoints?
rmf [24 minutes ago]
That's only half correct. There are several problematic characters in
the Japanese encoding standards, but this isn't an issue with Han
unification.
rmf [22 minutes ago]
Those pairs are pairs *of the same character*, which happens to exist
*twice* in common Shift-JIS codepages, like Microsoft's cp932.
rmf [20 minutes ago]
The reason those code pages encode the same character twice is because
of the way Shift-JIS extensions occurred. Almost all of the problematic
characters were added by NEC and by IBM at separate Shift-JIS code
points (edited)
rmf [19 minutes ago]
Because these pairs don't overlap, Microsoft's code page doubles as an
IBM-compatible Shift-JIS and as a NEC-compatible Shift-JIS by mapping both.
rmf [11 minutes ago]
So yes, some Shift-JIS codepages, like cp932, don't roundtrip with naive
processes, but:
rmf [10 minutes ago]
1. the problem is specific to the code pages and unrelated to Han
unification
rmf [10 minutes ago]
2. the problem is actually irrelevant unless you're interacting with
e.g. NEC-only or IBM-only systems
rmf [10 minutes ago]
And 3. Unicode has mechanisms to actually roundtrip this properly if you
need it.
rmf [7 minutes ago]
(If you want an example: NEC encoded 纊at Shift-JIS position 0xED40; IBM
encoded it at Shift-JIS position 0xFA5C; Unicode has it at U+7E8A)
-- You received this message because you are subscribed to the Google Groups "std-text-wg" group. To unsubscribe from this group and stop receiving emails from it, send an email to std-text-wg+unsubscribe_at_[hidden] <mailto:std-text-wg+unsubscribe_at_[hidden]>. To post to this group, send email to std-text-wg_at_[hidden] <mailto:std-text-wg_at_[hidden]>. To view this discussion on the web visit https://groups.google.com/d/msgid/std-text-wg/75742882-938E-45C6-86F1-F541723431E8%40vmware.com <https://groups.google.com/d/msgid/std-text-wg/75742882-938E-45C6-86F1-F541723431E8%40vmware.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout.
Received on 2018-04-15 16:08:21
