Subject: Re: [ WG14 ] Mixed Wide String Literals
From: Peter Brett (pbrett_at_[hidden])
Date: 2020-12-07 04:20:57
You can already use any character (including '@' and '$') in comments, portably!
If you have written '@' or '$' in a comment, then it implies that:
- you must have written the file using a source encoding that can represent these characters
- you must be compiling the source file with a compiler which understands that source file encoding
In phase 1 of translation, some implementation-defined process converts the file to a sequence of basic source characters and universal character names:
> Physical source file characters are mapped, in an implementation-defined
> manner, to the basic source character set (introducing new-line characters
> for end-of-line indicators) if necessary. The set of physical source file
> characters accepted is implementation-defined. Any source file character
> not in the basic source character set is replaced by the
> universal-character-name that designates that character.
The definition of a comment does not depend on the basic source character set:
> The characters /* start a comment, which terminates with the characters */.
> These comments do not nest. The characters // start a comment, which
> terminates immediately before the next new-line character. If there is a
> form-feed or a vertical-tab character in such a comment, only whitespace
> characters shall appear between it and the new-line that terminates the
> comment; no diagnostic is required.
For portable use of '@' or '$' in comments, therefore, you only need to use a source file encoding that is supported by the vast majority of implementations. 7-bit ASCII satisfies that requirement!
The same constraint applies for both string and character literals. The universal-character-name mechanism means that portability issues may be encountered only due to restrictions arising from implementation-defined behaviour, and more specifically the source file encoding and assumed execution encoding.
> -----Original Message-----
> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Philipp Klaus Krause
> via SG16
> Sent: 07 December 2020 10:02
> To: SG16 <sg16_at_[hidden]>
> Cc: Philipp Klaus Krause <krauseph_at_[hidden]>
> Subject: Re: [SG16] [ WG14 ] Mixed Wide String Literals
> EXTERNAL MAIL
> Am 07.12.20 um 09:50 schrieb Corentin Jabot:
> > Will there be problems for EBCDIC systems? AFAIK, C++ dropped support
> > for EBCDIC the moment there was no IBM representative in WG14. But C
> > still supports it.
> > Unicode identifiers happen in C++ after phase 1 and so neither the
> > source encoding nor the execution encoding are impacted.
> I see. Thanks.
> > P.S.: A related topic, but far less ambitious: Would it make sense to
> > add @ and $ to the basic source character set
> MXGMXlOLMBsA$ >)? AFAIK, this
> > should work for implementations that use ASCII (or an extension, such
> > UTF-8) as well as those that use an EBCDIC code page that can be used
> > for C programming today.
> > It's actually more ambitious I am afraid.Â This question is asked
> > frequentlyÂ in WG21 and 2 arguments against it are
> > @ is used by objective-c(++) and $ is 1/valid in identifiers in some gcc
> > implementation extension 2/ used by code generators.
> I see that as an argument against allowing them in identifiers. But I
> don't see the argument against adding them to the basic character set,
> so one could portably use them in comments, string and character literals.
> SG16 mailing list
SG16 list run by firstname.lastname@example.org