Date: Wed, 1 Jul 2020 10:14:33 +0200
On 01/07/2020 09.44, Corentin wrote:
>
>
> On Wed, 1 Jul 2020 at 09:29, Jens Maurer via Core <core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
> We should be clear in the text whether an implementation is allowed to encode
> a sequence of non-numeric-escape-sequence s-chars as a whole, or whether
> each character is encoded separately. There was concern that "separately"
> doesn't address stateful encodings, where the encoding of string character
> i+1 may depend on what string character i was.
>
>
> We should be careful not to change the behavior here.
> Encoding sequences allow an implementation to encode <latin small letter e, combining accute accent> as <latin small letter e with acute>
Agreed. We should probably prohibit doing that for UTF-x literals,
but I'm not seeing a behavior change for ordinary and wide string
literals.
> Which is not the current behavior described by the standard.
Could you point me to the specific place where the standard
doesn't allow that, currently?
[lex.string] p10
"it is initialized with the given characters."
for example doesn't speak to the question, in my view.
> I think this is a much more important aspect (whether we think an implementation should be able to do that or not) than trying to describe the idiosyncrasies of all encodings.
Fine. My answer is "yes, an implementation should be allowed to do that".
(And, as QoI, implementations will avoid shooting their customers in the
foot more often than not.)
Jens
>
>
> On Wed, 1 Jul 2020 at 09:29, Jens Maurer via Core <core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
> We should be clear in the text whether an implementation is allowed to encode
> a sequence of non-numeric-escape-sequence s-chars as a whole, or whether
> each character is encoded separately. There was concern that "separately"
> doesn't address stateful encodings, where the encoding of string character
> i+1 may depend on what string character i was.
>
>
> We should be careful not to change the behavior here.
> Encoding sequences allow an implementation to encode <latin small letter e, combining accute accent> as <latin small letter e with acute>
Agreed. We should probably prohibit doing that for UTF-x literals,
but I'm not seeing a behavior change for ordinary and wide string
literals.
> Which is not the current behavior described by the standard.
Could you point me to the specific place where the standard
doesn't allow that, currently?
[lex.string] p10
"it is initialized with the given characters."
for example doesn't speak to the question, in my view.
> I think this is a much more important aspect (whether we think an implementation should be able to do that or not) than trying to describe the idiosyncrasies of all encodings.
Fine. My answer is "yes, an implementation should be allowed to do that".
(And, as QoI, implementations will avoid shooting their customers in the
foot more often than not.)
Jens
Received on 2020-07-01 03:17:51