Date: Mon, 7 Dec 2020 10:20:57 +0000
Hi Philipp,
You can already use any character (including '@' and '$') in comments, portably!
If you have written '@' or '$' in a comment, then it implies that:
- you must have written the file using a source encoding that can represent these characters
- you must be compiling the source file with a compiler which understands that source file encoding
In phase 1 of translation, some implementation-defined process converts the file to a sequence of basic source characters and universal character names:
[lex.phases 1.1]
> Physical source file characters are mapped, in an implementation-defined
> manner, to the basic source character set (introducing new-line characters
> for end-of-line indicators) if necessary. The set of physical source file
> characters accepted is implementation-defined. Any source file character
> not in the basic source character set is replaced by the
> universal-character-name that designates that character.
The definition of a comment does not depend on the basic source character set:
[lex.comment 1]
> The characters /* start a comment, which terminates with the characters */.
> These comments do not nest. The characters // start a comment, which
> terminates immediately before the next new-line character. If there is a
> form-feed or a vertical-tab character in such a comment, only whitespace
> characters shall appear between it and the new-line that terminates the
> comment; no diagnostic is required.
For portable use of '@' or '$' in comments, therefore, you only need to use a source file encoding that is supported by the vast majority of implementations. 7-bit ASCII satisfies that requirement!
The same constraint applies for both string and character literals. The universal-character-name mechanism means that portability issues may be encountered only due to restrictions arising from implementation-defined behaviour, and more specifically the source file encoding and assumed execution encoding.
Best regards,
Peter
> -----Original Message-----
> From: SG16 <sg16-bounces_at_lists.isocpp.org> On Behalf Of Philipp Klaus Krause
> via SG16
> Sent: 07 December 2020 10:02
> To: SG16 <sg16_at_lists.isocpp.org>
> Cc: Philipp Klaus Krause <krauseph_at_informatik.uni-freiburg.de>
> Subject: Re: [SG16] [ WG14 ] Mixed Wide String Literals
>
> EXTERNAL MAIL
>
>
> Am 07.12.20 um 09:50 schrieb Corentin Jabot:
> >
> > Will there be problems for EBCDIC systems? AFAIK, C++ dropped support
> > for EBCDIC the moment there was no IBM representative in WG14. But C
> > still supports it.
> >
> >
> >
> > Unicode identifiers happen in C++ after phase 1 and so neither the
> > source encoding nor the execution encoding are impacted.
> >
>
> I see. Thanks.
>
> >
> > P.S.: A related topic, but far less ambitious: Would it make sense to
> > add @ and $ to the basic source character set
> >
> (https://urldefense.com/v3/__http://www.colecovision.eu/stuff/proposal-
> basic-
> @.html__;!!EHscmS1ygiU1lA!SdJoDLLokdXQ2NxIQy6A9fJeq7eOmC09auMCDm1nxaDqcXrQRl
> MXGMXlOLMBsA$
> >
> <https://urldefense.com/v3/__http://www.colecovision.eu/stuff/proposal-
> basic-
> @.html__;!!EHscmS1ygiU1lA!SdJoDLLokdXQ2NxIQy6A9fJeq7eOmC09auMCDm1nxaDqcXrQRl
> MXGMXlOLMBsA$ >)? AFAIK, this
> > should work for implementations that use ASCII (or an extension, such
> as
> > UTF-8) as well as those that use an EBCDIC code page that can be used
> > for C programming today.
> >
> >
> > It's actually more ambitious I am afraid. This question is asked
> > frequently in WG21 and 2 arguments against it are
> >
> > @ is used by objective-c(++) and $ is 1/valid in identifiers in some gcc
> > implementation extension 2/ used by code generators.
>
> I see that as an argument against allowing them in identifiers. But I
> don't see the argument against adding them to the basic character set,
> so one could portably use them in comments, string and character literals.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg
> 16__;!!EHscmS1ygiU1lA!SdJoDLLokdXQ2NxIQy6A9fJeq7eOmC09auMCDm1nxaDqcXrQRlMXGM
> UITzmQkA$
You can already use any character (including '@' and '$') in comments, portably!
If you have written '@' or '$' in a comment, then it implies that:
- you must have written the file using a source encoding that can represent these characters
- you must be compiling the source file with a compiler which understands that source file encoding
In phase 1 of translation, some implementation-defined process converts the file to a sequence of basic source characters and universal character names:
[lex.phases 1.1]
> Physical source file characters are mapped, in an implementation-defined
> manner, to the basic source character set (introducing new-line characters
> for end-of-line indicators) if necessary. The set of physical source file
> characters accepted is implementation-defined. Any source file character
> not in the basic source character set is replaced by the
> universal-character-name that designates that character.
The definition of a comment does not depend on the basic source character set:
[lex.comment 1]
> The characters /* start a comment, which terminates with the characters */.
> These comments do not nest. The characters // start a comment, which
> terminates immediately before the next new-line character. If there is a
> form-feed or a vertical-tab character in such a comment, only whitespace
> characters shall appear between it and the new-line that terminates the
> comment; no diagnostic is required.
For portable use of '@' or '$' in comments, therefore, you only need to use a source file encoding that is supported by the vast majority of implementations. 7-bit ASCII satisfies that requirement!
The same constraint applies for both string and character literals. The universal-character-name mechanism means that portability issues may be encountered only due to restrictions arising from implementation-defined behaviour, and more specifically the source file encoding and assumed execution encoding.
Best regards,
Peter
> -----Original Message-----
> From: SG16 <sg16-bounces_at_lists.isocpp.org> On Behalf Of Philipp Klaus Krause
> via SG16
> Sent: 07 December 2020 10:02
> To: SG16 <sg16_at_lists.isocpp.org>
> Cc: Philipp Klaus Krause <krauseph_at_informatik.uni-freiburg.de>
> Subject: Re: [SG16] [ WG14 ] Mixed Wide String Literals
>
> EXTERNAL MAIL
>
>
> Am 07.12.20 um 09:50 schrieb Corentin Jabot:
> >
> > Will there be problems for EBCDIC systems? AFAIK, C++ dropped support
> > for EBCDIC the moment there was no IBM representative in WG14. But C
> > still supports it.
> >
> >
> >
> > Unicode identifiers happen in C++ after phase 1 and so neither the
> > source encoding nor the execution encoding are impacted.
> >
>
> I see. Thanks.
>
> >
> > P.S.: A related topic, but far less ambitious: Would it make sense to
> > add @ and $ to the basic source character set
> >
> (https://urldefense.com/v3/__http://www.colecovision.eu/stuff/proposal-
> basic-
> @.html__;!!EHscmS1ygiU1lA!SdJoDLLokdXQ2NxIQy6A9fJeq7eOmC09auMCDm1nxaDqcXrQRl
> MXGMXlOLMBsA$
> >
> <https://urldefense.com/v3/__http://www.colecovision.eu/stuff/proposal-
> basic-
> @.html__;!!EHscmS1ygiU1lA!SdJoDLLokdXQ2NxIQy6A9fJeq7eOmC09auMCDm1nxaDqcXrQRl
> MXGMXlOLMBsA$ >)? AFAIK, this
> > should work for implementations that use ASCII (or an extension, such
> as
> > UTF-8) as well as those that use an EBCDIC code page that can be used
> > for C programming today.
> >
> >
> > It's actually more ambitious I am afraid. This question is asked
> > frequently in WG21 and 2 arguments against it are
> >
> > @ is used by objective-c(++) and $ is 1/valid in identifiers in some gcc
> > implementation extension 2/ used by code generators.
>
> I see that as an argument against allowing them in identifiers. But I
> don't see the argument against adding them to the basic character set,
> so one could portably use them in comments, string and character literals.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg
> 16__;!!EHscmS1ygiU1lA!SdJoDLLokdXQ2NxIQy6A9fJeq7eOmC09auMCDm1nxaDqcXrQRlMXGM
> UITzmQkA$
Received on 2020-12-07 04:21:21