Subject: To which extent characters can be replaced or removed in phase 1?
From: Corentin (corentin.jabot_at_[hidden])
Date: 2020-05-28 07:50:16
This GCC issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38433 argues
that it is valid
for an implementation to remove trailing whitespaces as part of the
implementation defined mapping described in translation phase 1.
Is it the intent of that wording?
Should it be specified that this implementation defined mapping should
preserve the semantic of each abstract character present in the physical
If not, is it a valid implementation to perform arbitrary text
transformation in phase 1 such as replacing "private" by "public" or
replacing all "e" by a "z" ?
For reference here is the definition of abstract character in Unicode 13
Abstract character: A unit of information used for the organization,
control, or representation of textual data.
â¢ When representing data, the nature of that data is generally symbolic as
opposed to some other kind of data (for example, aural or visual). Examples
such symbolic data include letters, ideographs, digits, punctuation,
symbols, and dingbats.
â¢ An abstract character has no concrete form and should not be confused
â¢ An abstract character does not necessarily correspond to what a user
as a âcharacterâ and should not be confused with a grapheme.
â¢ The abstract characters encoded by the Unicode Standard are known as
Unicode abstract characters.
â¢ Abstract characters not directly encoded by the Unicode Standard can
represented by the use of combining character sequences.
SG16 list run by email@example.com