C++ Logo

sg16

Advanced search

Re: [SG16] Just realized that the UCN is not a single character

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 8 Apr 2020 18:45:54 -0400
Phase 1 the bare accent is translated to universal-character-name, so I
don't think there is a detectable difference, at least that way, for #define
accent(x) x ## \u0300 vs #define accent(x) x ## `
Now, in phase 3 we are converting to preprocessor-tokens and whitespace. So
we get {x}{white-space}{##}{white-space} and then I think {\u0300}, not
{\}{u}{0}{3}{0}{0}, because \u0300 is a "each non-white-space character
that cannot be one of the above" , on the theory that if it wasn't a
combining character, but instead something valid, like \00C0, À, we would
expect the result to be the two characters pasted together? That is a
'universal-character-name' names a character.

One of the reasons to eventually get back to my paper trying to clean up
'character' and related terms, because they turn out to be much more
complex and vague than expected. It needs substantive revision, though,
because some parts that are a bit ambiguous I had interpreted other than
how it seems to be intended to be interpreted.

On Wed, Apr 8, 2020 at 6:00 PM Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

> On 08/04/2020 23.49, Zach Laine via SG16 wrote:
> > On Wed, Apr 8, 2020 at 4:38 PM Jens Maurer via SG16 <
> sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> >
> > On 08/04/2020 23.27, Hubert Tong via SG16 wrote:
> > > Seems GCC is right again...
> > >
> > > \u0300, whether the result of forming a UCN or physically present
> as a UCN is a string of six characters from the basic source character
> set...
> >
> > So, it's six characters, so
> >
> > #define accent(x) x ## \u0300
> >
> > becomes
> >
> >
> > #define accent(x) x ## \ u0300
> >
> > with \ a lone character and u0300 a separate preprocessing-token /
> identifier.
> >
> > Disturbing UCNs like that is really counter-intuitive.
> > We should fix that.
> >
> > Jens
> >
> >
> > Does that also imply that this:
> >
> > #define accent(x) x ## \u0300
> >
> > and this:
> >
> > #define accent(x) x ## `
> >
> > are not equivalent? If so, I'm even more disturbed.
>
> Yes. Be disturbed.
>
> Jens
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-04-08 17:49:00