C++ Logo

sg16

Advanced search

[SG16-Unicode] Fwd: Shift-JIS NEC/IBM discussion

From: Tom Honermann <tom_at_[hidden]>
Date: Sun, 15 Apr 2018 10:08:15 -0400
Preserving from the old std-text-wg_at_[hidden] mailing list.

-------- Forwarded Message --------
Subject: Shift-JIS NEC/IBM discussion
Date: Mon, 26 Feb 2018 18:18:25 +0000
From: Mark Zeren <mzeren_at_[hidden]>
To: std-text-wg <std-text-wg_at_[hidden]>



copied from Slack for safe keeping:

sdowney [1 hour ago]

Shift-JIS has a few hundred distinct character pairs that were unified
into the same unicode codepoints?

rmf [24 minutes ago]

That's only half correct. There are several problematic characters in
the Japanese encoding standards, but this isn't an issue with Han
unification.

rmf [22 minutes ago]

Those pairs are pairs *of the same character*, which happens to exist
*twice* in common Shift-JIS codepages, like Microsoft's cp932.

rmf [20 minutes ago]

The reason those code pages encode the same character twice is because
of the way Shift-JIS extensions occurred. Almost all of the problematic
characters were added by NEC and by IBM at separate Shift-JIS code
points (edited)

rmf [19 minutes ago]

Because these pairs don't overlap, Microsoft's code page doubles as an
IBM-compatible Shift-JIS and as a NEC-compatible Shift-JIS by mapping both.

rmf [11 minutes ago]

So yes, some Shift-JIS codepages, like cp932, don't roundtrip with naive
processes, but:

rmf [10 minutes ago]

1. the problem is specific to the code pages and unrelated to Han
unification

rmf [10 minutes ago]

2. the problem is actually irrelevant unless you're interacting with
e.g. NEC-only or IBM-only systems

rmf [10 minutes ago]

And 3. Unicode has mechanisms to actually roundtrip this properly if you
need it.

rmf [7 minutes ago]

(If you want an example: NEC encoded 纊at Shift-JIS position 0xED40; IBM
encoded it at Shift-JIS position 0xFA5C; Unicode has it at U+7E8A)

-- 
You received this message because you are subscribed to the Google 
Groups "std-text-wg" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to std-text-wg+unsubscribe_at_[hidden] 
<mailto:std-text-wg+unsubscribe_at_[hidden]>.
To post to this group, send email to std-text-wg_at_[hidden] 
<mailto:std-text-wg_at_[hidden]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/std-text-wg/75742882-938E-45C6-86F1-F541723431E8%40vmware.com 
<https://groups.google.com/d/msgid/std-text-wg/75742882-938E-45C6-86F1-F541723431E8%40vmware.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

Received on 2018-04-15 16:08:21