C++ Logo

sg16

Advanced search

Re: [SG16] D1949R4 - Unicode Identifiers

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 27 May 2020 12:15:21 +0200
On Wed, May 27, 2020, 00:20 Steve Downey via SG16 <sg16_at_[hidden]>
wrote:

>
> - For the #define and ## I am thinking of striking the discussion
> other than to point out there is implementation divergence. Something like:
>
> There is implementation divergence on how the concat operator, ##,
> works with combining characters. The code below is flagged as an error
> today by gcc, it is accepted by clang and msvc.
>
> <#m_-6683537242318835552_cb3-1>#define accent(x) x##\u0300 <#m_-6683537242318835552_cb3-2> <#m_-6683537242318835552_cb3-3>constexpr int accent(A) = 2; <#m_-6683537242318835552_cb3-4>constexpr int gv2 = A\u0300; <#m_-6683537242318835552_cb3-5>static_assert(gv2 == 2, "whatever");
>
> Compiler Explorer Link
> <https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,j:1,lang:c%2B%2B,selection:(endColumn:37,endLineNumber:5,positionColumn:1,positionLineNumber:1,selectionStartColumn:37,selectionStartLineNumber:5,startColumn:1,startLineNumber:1),source:'%23define+accent(x)+x%23%23%5Cu0300%0A%0Aconstexpr+int+accent(A)+%3D+2%3B%0Aconstexpr+int+gv2+%3D+A%5Cu0300%3B%0Astatic_assert(gv2+%3D%3D+2,+%22whatever%22)%3B%0A'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:50,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:conformance,i:(compilers:!((compilerId:clang_trunk,options:''),(compilerId:gsnapshot,options:''),(compilerId:vcpp_v19_24_x64,options:'')),editorid:1,langId:c%2B%2B,libs:!()),l:'5',n:'0',o:'Conformance+viewer+(Editor+%231)+3/10',t:'0')),header:(),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4>
>
> The ability to form valid identifiers via token pasting with combining
> characters is not a goal of this paper.
>
>
> Didn't we talked about that many times?
\u0300 is not a valid identifier so it should not be a valid pptoken and
the above code should be ill-formed.

It's the only sane approach, and the only approach that doesn't require
renormalization of the concatenated sequence.

>
> -
>
>
>
>
>
>
> On Tue, May 26, 2020 at 5:26 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>> On 26/05/2020 22.51, Steve Downey via SG16 wrote:
>> > Find attached a draft of the UAX31 paper for discussion.
>> > Viewable at
>> http://htmlpreview.github.io/?https://github.com/steve-downey/papers/blob/master/generated/p1949.html
>> > Source at https://github.com/steve-downey/papers/blob/master/p1949.md
>> >
>> > (note that github doesn't format the same way that mpark's WG21 format
>> does)
>>
>> Section 1, third bullet: add "be" (otherwise the sentence is missing a
>> verb)
>>
>>
>> "Note that #define and ## require identifiers as operands, and combining
>> characters are generally not allowed as leading characters in identifiers.
>>
>> This is correct for #define, but not for ##.
>> A pp-token is enough for ##; in particular, you can combine
>> an identifier with a pp-number (which is not an identifier).
>>
>>
>> "[ Note: In translation phase 4, identifier also includes those
>> preprocessing-tokens (5.4 [lex.pptoken]) differentiated as keywords (5.11
>> [lex.key]) in the later translation phase 7 (5.6 [lex.token]). – end note ]"
>>
>> Please make "identifier" and "preprocessing-token" italics (grammar
>> references).
>>
>>
>> Jens
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-05-27 05:18:40