C++ Logo


Advanced search

[SG16] Structure of EBCDIC MBCS and wide EBCDIC

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Wed, 6 Oct 2021 21:24:37 -0400
For information, since interest was expressed in today's meeting.

Wide characters are mostly a C/C++ invention. For EBCDIC encodings that do
not have multibyte characters, the wide encoding of a character consists of
the unsigned char value of the character in a wchar_t.

EBCDIC also has multibyte encodings. These are formed by pairing
single-byte encodings and double-byte encodings. The unification of
single-byte and double-byte encodings into a multibyte, stateful "narrow"
encoding is achieved using shift-out/shift-in.

The wide encoding of a character from a multibyte EBCDIC encoding is as
described above for a character from the single-byte component encoding.
For a character from the double-byte component encoding, the wide encoding
of a character consists of the value obtained by using the first byte of
the double-byte character as the upper 8 bits of a 16-bit value and the
second byte as the lower 8 bits.

While the link still works (they just shuffled everything earlier this
year), the following document describes the shift state usage:
When a shift character is in the data stream - IBM Documentation

To figure out what form of EBCDIC a CCSID refers to, the following document
describes the "encoding schemes" (which, in this usage, is more the nature
of the encoding or "meta encoding schemes") and the scheme associated with
various CCSIDs (it also includes names for the CCSIDs):
Scheme - IBM Documentation

The component single-byte and double-byte encodings for a multibyte EBCDIC
encoding can be found here:
MBCS CCSID decomposition - IBM Documentation

As a bonus, a table that maps between EBCDIC and Unix or "PC" encodings is
Associated CCSIDs - IBM Documentation

Received on 2021-10-06 20:25:10