C++ Logo


Advanced search

Re: [SG16] Agreeing with Corentin's point re: problem with strict use of abstract characters

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Sun, 14 Jun 2020 18:06:15 -0400
On Sun, Jun 14, 2020 at 2:44 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> Supposed I'm on an EBCDIC-only system.
> If I understand correctly, EBCDIC has a host of control characters
> that can't be represented in Unicode (while preserving semantics);
> the mapping that exists re-uses some Unicode control characters,
> but with different semantics.
> Suppose one of those EBCDIC control characters is mapped to \u1234.
> What if \u1234 also appears as such in my source code, obviously
> intending to mean the Unicode semantics?
> I think I'd like

I can't speak to whether you have such a preference or not. :)

> to have at least the option of getting a
> syntax error (I asked for a Unicode control character that
> doesn't exist as such on EBCDIC), but it seems the mapping
> will give me the EBCDIC control character with different
> semantics. (All of this in a string literal, of course,
> so it hurts when performing output.)
> Hubert, is my understanding above correct?
Yes. The possibility of such an option (to get an error) should not be
ruled out by some "accident of wording" in the specification of C++.

As a point of information (from another thread), implementations are known
that perform the "conventional" mapping.

The presence of a UCN for a C1 (non-EBCDIC) control character in a
supposedly-EBCDIC string is not immediately indicative of an error. For
example, the environment I use has SSH perform autoconversion (with the
"conventional" mapping) from EBCDIC-1047 to ISO-8859-1.

> Jens

Received on 2020-06-14 17:09:42