C++ Logo

sg16

Advanced search

Re: [SG16] [isocpp-core] To which extent characters can be replaced or removed in phase 1?

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Thu, 28 May 2020 10:08:27 -0400
On Thu, May 28, 2020 at 9:53 AM Jean-Marc Bourguet via SG16 <
sg16_at_[hidden]> wrote:

> I don't know what's the intend for source input, but when considering IO,
> the C standard says
>
>
> > Data read in from a text stream will necessarily compare equal to the
> data that were earlier
> > written out to that stream only if: the data consist only of printing
> characters and the control
> > characters horizontal tab and new-line; no new-line character is
> immediately preceded by
> > space characters; and the last character is a new-line character.
> Whether space characters
> > that are written out immediately before a new-line character appear when
> read in is
> > implementation-defined.
>
> and I seem to remember implementations (VMS, IBM mainframes?) which are
> using fixed length
> record padded with spaces as text format and thus are unable to make the
> difference between
> desired and non desired spaces at end of line.
>
There would be inherent ambiguity for raw strings and removing the
implementation-definedness actually removes the requirement for the
implementation to document how the ambiguity is resolved.


> If we want to gather for them, we have to allow
> implementations to ignore spaces before end of line. Are they
> still important for us is an open
> question. For me, VMS left my world 25 years ago and I've never been
> involved with mainframes.
>
> Yours,
>
> -- Jean-Marc
>
> ------------------------------
> *De :* Core <core-bounces_at_[hidden]> de la part de Corentin via
> Core <core_at_[hidden]>
> *Envoyé :* jeudi 28 mai 2020 14:50
> *À :* C++ Core Language Working Group
> *Cc :* Corentin; SG16; Alisdair Meredith
> *Objet :* [SPAM] [isocpp-core] To which extent characters can be replaced
> or removed in phase 1?
>
> Hello,
>
> This GCC issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38433 argues
> that it is valid
> for an implementation to remove trailing whitespaces as part of the
> implementation defined mapping described in translation phase 1.
> [lex.phases]
>
> Is it the intent of that wording?
> Should it be specified that this implementation defined mapping should
> preserve the semantic of each abstract character present in the physical
> source file?
> If not, is it a valid implementation to perform arbitrary text
> transformation in phase 1 such as replacing "private" by "public" or
> replacing all "e" by a "z" ?
>
> Thanks,
>
> Corentin
>
>
> For reference here is the definition of abstract character in Unicode 13
> http://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G2212
>
> Abstract character: A unit of information used for the organization,
> control, or representation of textual data.
> • When representing data, the nature of that data is generally symbolic as
> opposed to some other kind of data (for example, aural or visual).
> Examples of
> such symbolic data include letters, ideographs, digits, punctuation,
> technical
> symbols, and dingbats.
> • An abstract character has no concrete form and should not be confused
> with a
> glyph.
> • An abstract character does not necessarily correspond to what a user
> thinks of
> as a “character” and should not be confused with a grapheme.
> • The abstract characters encoded by the Unicode Standard are known as
> Unicode abstract characters.
> • Abstract characters not directly encoded by the Unicode Standard can
> often be
> represented by the use of combining character sequences.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-05-28 09:11:50