Preserving from the old std-text-wg@googlegroups.com mailing list.

-------- Forwarded Message --------
Subject: Shift-JIS NEC/IBM discussion
Date: Mon, 26 Feb 2018 18:18:25 +0000
From: Mark Zeren <mzeren@vmware.com>
To: std-text-wg <std-text-wg@googlegroups.com>


copied from Slack for safe keeping:

 

sdowney [1 hour ago]

Shift-JIS has a few hundred distinct character pairs that were unified into the same unicode codepoints?

 

 

rmf [24 minutes ago]

That's only half correct. There are several problematic characters in the Japanese encoding standards, but this isn't an issue with Han unification.

 

 

rmf [22 minutes ago]

Those pairs are pairs *of the same character*, which happens to exist *twice* in common Shift-JIS codepages, like Microsoft's cp932.

 

 

rmf [20 minutes ago]

The reason those code pages encode the same character twice is because of the way Shift-JIS extensions occurred. Almost all of the problematic characters were added by NEC and by IBM at separate Shift-JIS code points (edited)

 

 

rmf [19 minutes ago]

Because these pairs don't overlap, Microsoft's code page doubles as an IBM-compatible Shift-JIS and as a NEC-compatible Shift-JIS by mapping both.

 

 

rmf [11 minutes ago]

So yes, some Shift-JIS codepages, like cp932, don't roundtrip with naive processes, but:

 

 

rmf [10 minutes ago]

1. the problem is specific to the code pages and unrelated to Han unification

 

 

rmf [10 minutes ago]

2. the problem is actually irrelevant unless you're interacting with e.g. NEC-only or IBM-only systems

 

 

rmf [10 minutes ago]

And 3. Unicode has mechanisms to actually roundtrip this properly if you need it.

 

 

rmf [7 minutes ago]

(If you want an example: NEC encoded at Shift-JIS position 0xED40; IBM encoded it at Shift-JIS position 0xFA5C; Unicode has it at U+7E8A)

--
You received this message because you are subscribed to the Google Groups "std-text-wg" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-text-wg+unsubscribe@googlegroups.com.
To post to this group, send email to std-text-wg@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/std-text-wg/75742882-938E-45C6-86F1-F541723431E8%40vmware.com.
For more options, visit https://groups.google.com/d/optout.