Date: Wed, 1 Jul 2020 10:23:52 +0200
On Wed, 1 Jul 2020 at 10:14, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
> On 01/07/2020 09.44, Corentin wrote:
> >
> >
> > On Wed, 1 Jul 2020 at 09:29, Jens Maurer via Core <core_at_[hidden]
> <mailto:core_at_[hidden]>> wrote:
>
> > We should be clear in the text whether an implementation is allowed
> to encode
> > a sequence of non-numeric-escape-sequence s-chars as a whole, or
> whether
> > each character is encoded separately. There was concern that
> "separately"
> > doesn't address stateful encodings, where the encoding of string
> character
> > i+1 may depend on what string character i was.
> >
> >
> > We should be careful not to change the behavior here.
> > Encoding sequences allow an implementation to encode <latin small letter
> e, combining accute accent> as <latin small letter e with acute>
>
> Agreed. We should probably prohibit doing that for UTF-x literals,
> but I'm not seeing a behavior change for ordinary and wide string
> literals.
>
> > Which is not the current behavior described by the standard.
>
> Could you point me to the specific place where the standard
> doesn't allow that, currently?
>
> [lex.string] p10
> "it is initialized with the given characters."
>
> for example doesn't speak to the question, in my view.
>
My reading of the description of the size of the string
http://eel.is/c++draft/lex.string#1
>
> > I think this is a much more important aspect (whether we think an
> implementation should be able to do that or not) than trying to describe
> the idiosyncrasies of all encodings.
>
> Fine. My answer is "yes, an implementation should be allowed to do that".
> (And, as QoI, implementations will avoid shooting their customers in the
> foot more often than not.)
>
> Jens
>
> On 01/07/2020 09.44, Corentin wrote:
> >
> >
> > On Wed, 1 Jul 2020 at 09:29, Jens Maurer via Core <core_at_[hidden]
> <mailto:core_at_[hidden]>> wrote:
>
> > We should be clear in the text whether an implementation is allowed
> to encode
> > a sequence of non-numeric-escape-sequence s-chars as a whole, or
> whether
> > each character is encoded separately. There was concern that
> "separately"
> > doesn't address stateful encodings, where the encoding of string
> character
> > i+1 may depend on what string character i was.
> >
> >
> > We should be careful not to change the behavior here.
> > Encoding sequences allow an implementation to encode <latin small letter
> e, combining accute accent> as <latin small letter e with acute>
>
> Agreed. We should probably prohibit doing that for UTF-x literals,
> but I'm not seeing a behavior change for ordinary and wide string
> literals.
>
> > Which is not the current behavior described by the standard.
>
> Could you point me to the specific place where the standard
> doesn't allow that, currently?
>
> [lex.string] p10
> "it is initialized with the given characters."
>
> for example doesn't speak to the question, in my view.
>
My reading of the description of the size of the string
http://eel.is/c++draft/lex.string#1
>
> > I think this is a much more important aspect (whether we think an
> implementation should be able to do that or not) than trying to describe
> the idiosyncrasies of all encodings.
>
> Fine. My answer is "yes, an implementation should be allowed to do that".
> (And, as QoI, implementations will avoid shooting their customers in the
> foot more often than not.)
>
> Jens
>
Received on 2020-07-01 03:27:17