Subject: Re: Agreeing with Corentin's point re: problem with strict use of abstract characters
From: Hubert Tong (hubert.reinterpretcast_at_[hidden])
Date: 2020-06-14 17:06:15
On Sun, Jun 14, 2020 at 2:44 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
> Supposed I'm on an EBCDIC-only system.
> If I understand correctly, EBCDIC has a host of control characters
> that can't be represented in Unicode (while preserving semantics);
> the mapping that exists re-uses some Unicode control characters,
> but with different semantics.
> Suppose one of those EBCDIC control characters is mapped to \u1234.
> What if \u1234 also appears as such in my source code, obviously
> intending to mean the Unicode semantics?
> I think I'd like
I can't speak to whether you have such a preference or not. :)
> to have at least the option of getting a
> syntax error (I asked for a Unicode control character that
> doesn't exist as such on EBCDIC), but it seems the mapping
> will give me the EBCDIC control character with different
> semantics. (All of this in a string literal, of course,
> so it hurts when performing output.)
> Hubert, is my understanding above correct?
Yes. The possibility of such an option (to get an error) should not be
ruled out by some "accident of wording" in the specification of C++.
As a point of information (from another thread), implementations are known
that perform the "conventional" mapping.
The presence of a UCN for a C1 (non-EBCDIC) control character in a
supposedly-EBCDIC string is not immediately indicative of an error. For
example, the environment I use has SSH perform autoconversion (with the
"conventional" mapping) from EBCDIC-1047 to ISO-8859-1.
SG16 list run by firstname.lastname@example.org