Subject: Re: [isocpp-core] To which extent characters can be replaced or removed in phase 1?
From: Nathan Sidwell (nathan_at_[hidden])
Date: 2020-06-02 11:28:00
On 5/28/20 8:50 AM, Corentin via Core wrote:
> This GCC issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38433%c2%a0argues that it
> is valid
> for an implementation to remove trailing whitespaces as part of the
> implementation defined mapping described in translationÂ phase 1. [lex.phases]
> Is it the intent of that wording?
> Should it be specified that this implementation defined mapping should preserve
> the semantic of each abstract character present in the physical source file?
> If not, is it a valid implementation to performÂ arbitrary text transformation in
> phase 1 such as replacing "private"Â by "public" or replacing all "e" by a "z" ?
FWI, I recently encountered this, in reimplementing raw string literal lexing.
We do not get that right when this extension is in play.
/* GNU backslash whitespace newline extension. FIXME
could be any sequence of non-vertical space. When we
can properly restore any such sequence, we should
mark this note as handled so _cpp_process_line_notes
doesn't warn. */
I do not know how easy fixing that could be. I retained the comment in a
bug-compatible reworking and moved on.
> For reference here is the definition of abstract character in Unicode 13
> Abstract character: A unit of information used for the organization, control, or
> representation of textual data.
> â¢ When representing data, the nature of that data is generally symbolic as
> opposed to some other kind of data (for example, aural or visual). Examples of
> such symbolic data include letters, ideographs, digits, punctuation, technical
> symbols, and dingbats.
> â¢ An abstract character has no concrete form and should not be confused with a
> â¢ An abstract character does not necessarily correspond to what a user thinks of
> as a âcharacterâ and should not be confused with a grapheme.
> â¢ The abstract characters encoded by the Unicode Standard are known as Unicode
> abstract characters.
> â¢ Abstract characters not directly encoded by the Unicode Standard can often be
> represented by the use of combining character sequences.
> Core mailing list
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2020/05/9153.php
-- Nathan Sidwell
SG16 list run by firstname.lastname@example.org