Date: Thu, 28 May 2020 14:34:28 +0100
Whether it was the original intent or not, existing implementations rely on
this freedom, and have reported significant code breakage if they try to
remove quirky stateful encodings that remove trailing whitespace in very
specific contexts.
If we were to consider nailing down a spirit for this law, It should be in
conjunction with WG14, as it will impact on header portability between
the two languages.
In short, I think this is a question for EWG, not Core, although it would
be great to understand implementers opinions before raising the topic
there.
AlisdairM
> On May 28, 2020, at 14:03, Peter Brett via SG16 <sg16_at_[hidden]> wrote:
>
> I’m pretty sure it’s not the intent of the wording, and I would like our friendly wordsmiths to help work out how to clarify that “mapped” does in fact mean “mapped” and not “arbitrarily transformed”.
>
> Peter
>
>
> From: SG16 <sg16-bounces_at_[hidden] <mailto:sg16-bounces_at_[hidden]>> On Behalf Of Corentin via SG16
> Sent: 28 May 2020 13:50
> To: C++ Core Language Working Group <core_at_[hidden] <mailto:core_at_[hidden]>>
> Cc: Corentin <corentin.jabot_at_[hidden] <mailto:corentin.jabot_at_[hidden]>>; SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>>
> Subject: [SG16] To which extent characters can be replaced or removed in phase 1?
>
> EXTERNAL MAIL
>
> Hello,
>
> This GCC issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38433 <https://urldefense.com/v3/__https:/gcc.gnu.org/bugzilla/show_bug.cgi?id=38433__;!!EHscmS1ygiU1lA!TGUoeR4GxXDlyO5DHaaL7YVLmpbzMFmEnAPxCKEFlfwHIXBCqgqiBI5ddr-mGw$> argues that it is valid
> for an implementation to remove trailing whitespaces as part of the implementation defined mapping described in translation phase 1. [lex.phases]
>
> Is it the intent of that wording?
> Should it be specified that this implementation defined mapping should preserve the semantic of each abstract character present in the physical source file?
> If not, is it a valid implementation to perform arbitrary text transformation in phase 1 such as replacing "private" by "public" or replacing all "e" by a "z" ?
>
> Thanks,
>
> Corentin
>
>
> For reference here is the definition of abstract character in Unicode 13
> http://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G2212 <https://urldefense.com/v3/__http:/www.unicode.org/versions/Unicode13.0.0/ch03.pdf*G2212__;Iw!!EHscmS1ygiU1lA!TGUoeR4GxXDlyO5DHaaL7YVLmpbzMFmEnAPxCKEFlfwHIXBCqgqiBI5Y4K7SNQ$>
>
> Abstract character: A unit of information used for the organization, control, or representation of textual data.
> • When representing data, the nature of that data is generally symbolic as
> opposed to some other kind of data (for example, aural or visual). Examples of
> such symbolic data include letters, ideographs, digits, punctuation, technical
> symbols, and dingbats.
> • An abstract character has no concrete form and should not be confused with a
> glyph.
> • An abstract character does not necessarily correspond to what a user thinks of
> as a “character” and should not be confused with a grapheme.
> • The abstract characters encoded by the Unicode Standard are known as Unicode abstract characters.
> • Abstract characters not directly encoded by the Unicode Standard can often be
> represented by the use of combining character sequences.
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16 <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
this freedom, and have reported significant code breakage if they try to
remove quirky stateful encodings that remove trailing whitespace in very
specific contexts.
If we were to consider nailing down a spirit for this law, It should be in
conjunction with WG14, as it will impact on header portability between
the two languages.
In short, I think this is a question for EWG, not Core, although it would
be great to understand implementers opinions before raising the topic
there.
AlisdairM
> On May 28, 2020, at 14:03, Peter Brett via SG16 <sg16_at_[hidden]> wrote:
>
> I’m pretty sure it’s not the intent of the wording, and I would like our friendly wordsmiths to help work out how to clarify that “mapped” does in fact mean “mapped” and not “arbitrarily transformed”.
>
> Peter
>
>
> From: SG16 <sg16-bounces_at_[hidden] <mailto:sg16-bounces_at_[hidden]>> On Behalf Of Corentin via SG16
> Sent: 28 May 2020 13:50
> To: C++ Core Language Working Group <core_at_[hidden] <mailto:core_at_[hidden]>>
> Cc: Corentin <corentin.jabot_at_[hidden] <mailto:corentin.jabot_at_[hidden]>>; SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>>
> Subject: [SG16] To which extent characters can be replaced or removed in phase 1?
>
> EXTERNAL MAIL
>
> Hello,
>
> This GCC issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38433 <https://urldefense.com/v3/__https:/gcc.gnu.org/bugzilla/show_bug.cgi?id=38433__;!!EHscmS1ygiU1lA!TGUoeR4GxXDlyO5DHaaL7YVLmpbzMFmEnAPxCKEFlfwHIXBCqgqiBI5ddr-mGw$> argues that it is valid
> for an implementation to remove trailing whitespaces as part of the implementation defined mapping described in translation phase 1. [lex.phases]
>
> Is it the intent of that wording?
> Should it be specified that this implementation defined mapping should preserve the semantic of each abstract character present in the physical source file?
> If not, is it a valid implementation to perform arbitrary text transformation in phase 1 such as replacing "private" by "public" or replacing all "e" by a "z" ?
>
> Thanks,
>
> Corentin
>
>
> For reference here is the definition of abstract character in Unicode 13
> http://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G2212 <https://urldefense.com/v3/__http:/www.unicode.org/versions/Unicode13.0.0/ch03.pdf*G2212__;Iw!!EHscmS1ygiU1lA!TGUoeR4GxXDlyO5DHaaL7YVLmpbzMFmEnAPxCKEFlfwHIXBCqgqiBI5Y4K7SNQ$>
>
> Abstract character: A unit of information used for the organization, control, or representation of textual data.
> • When representing data, the nature of that data is generally symbolic as
> opposed to some other kind of data (for example, aural or visual). Examples of
> such symbolic data include letters, ideographs, digits, punctuation, technical
> symbols, and dingbats.
> • An abstract character has no concrete form and should not be confused with a
> glyph.
> • An abstract character does not necessarily correspond to what a user thinks of
> as a “character” and should not be confused with a grapheme.
> • The abstract characters encoded by the Unicode Standard are known as Unicode abstract characters.
> • Abstract characters not directly encoded by the Unicode Standard can often be
> represented by the use of combining character sequences.
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16 <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
Received on 2020-05-28 08:37:37