Date: Fri, 3 Dec 2021 19:04:50 -0500
On 12/3/21 4:47 PM, Corentin Jabot wrote:
>
>
> On Fri, Dec 3, 2021, 22:03 Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
>
> On 12/1/21 2:28 PM, Corentin Jabot wrote:
>>
>>
>> On Wed, Dec 1, 2021 at 8:13 PM Tom Honermann <tom_at_[hidden]
>> <mailto:tom_at_[hidden]>> wrote:
>>
>> On 11/28/21 5:22 AM, Jens Maurer wrote:
>>> On 28/11/2021 10.42, Corentin Jabot via SG16 wrote:
>>>> On Sun, Nov 28, 2021, 01:31 Tom Honermann via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]> <mailto:sg16_at_[hidden]> <mailto:sg16_at_[hidden]>> wrote:
>>>> 2. If the estimated width of the fill character is greater than 1, then alignment to the end of the available space might not be possible. The choice here is whether to under-fill or over-fill the available space. This possibility is avoided if fill characters are restricted to characters with an estimated width of exactly 1.
>>>> std::format("{:🤡>4}", 123);
>>>>
>>>>
>>>> Is there value in specifying it? Neither solutions are great nor terrible, i think saying unspecified would be fine, so would underfilling i guess.
>>>>
>>>> Hopefully, we are consistent and choose option 1 among those specified in the lwg issue
>>>>
>>>> For P2286R3<https://wg21.link/p2286r3> <https://wg21.link/p2286r3>, LEWG requested<https://lists.isocpp.org/sg16/2021/11/2845.php> <https://lists.isocpp.org/sg16/2021/11/2845.php> that SG16 consider the ramifications for support of user defined delimiters. We should also discuss the "?" specifier proposed to explicitly opt in to quoted and escaped formats for std::string, std::string_view, and arrays of char/wchar_t.
>>>>
>>>> Not sure the quoted thing is in our purview.
>>>>
>>>> For the delimiter, we should support codepoints, to be consistent with everything else. The issue is the we don't have experience with that afaik.
>>> But the compile-time format string parser might not necessarily understand
>>> the details of the literal encoding, so it's unclear how codepoints map to
>>> code units. Or are you saying that the rest of std::format already requires
>>> detailed understanding, anyway?
>>
>> I believe the compile-time format string parser is already
>> required to understand such details. For example, if the
>> literal encoding is Shift-JIS, then the parser would need to
>> be able to differentiate byte values that appear as lead code
>> units vs trailing code units (since, for example, a 0x5C code
>> unit denotes the '\' character if it is a lead code unit, but
>> that value may also appear as a trailing code unit for a
>> double byte character).
>>
>> I think Jens is right. MSVC does handle Shift-JIS specifically
>> but I'm not sure we can/should mandate something that work
>> universally, the burden on implementation could be high)
>
> Are you suggesting that we should revisit the consensus for the
> proposed resolution for LWG3576
> <https://cplusplus.github.io/LWG/issue3576> from our 2021-08-25
> telecon
> <https://github.com/sg16-unicode/sg16-meetings#august-25th-2021>?
>
>
>
> I am concerned about implementability
> The current resolution calls for a compile time mechanism to read a
> codepoint for arbitrary encoding.
> Such mechanism currently doesn't exist.
> For an implementation like GCC, the generic solution would be to
> expose iconv facilities through builtins (the equivalent of mblen or
> mbrtocX at least, i think, as Hubert pointed out)
> This seems... A lot to ask in an issue resolution.
> I don't remember if that was considered last time or if it constitute
> new information in anyway but we might want to bring that up again.
I believe this requirement is already the status quo. Let me provide a
better example than I did previously.
std::format("<text>");
If the literal encoding is not self-synchronizing then <text> may
contain code units that correspond to the (single) code unit for '{' but
that do not encode the '{' character. This can happen due to DBCS or
shift-state encoding. An implementation needs to be able to recognize
this case (for effected encodings) in order to avoid incorrectly
interpreting the text as containing an introducer for a replacement field.
Tom.
>
> Tom.
>
>> I agree with Corentin that delimiters should be restricted to
>> code points. That is consistent with the direction we have
>> already advocated for fill characters in LWG3576
>> <https://cplusplus.github.io/LWG/issue3576>.
>>
>> Tom.
>>
>
>
>
> On Fri, Dec 3, 2021, 22:03 Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
>
> On 12/1/21 2:28 PM, Corentin Jabot wrote:
>>
>>
>> On Wed, Dec 1, 2021 at 8:13 PM Tom Honermann <tom_at_[hidden]
>> <mailto:tom_at_[hidden]>> wrote:
>>
>> On 11/28/21 5:22 AM, Jens Maurer wrote:
>>> On 28/11/2021 10.42, Corentin Jabot via SG16 wrote:
>>>> On Sun, Nov 28, 2021, 01:31 Tom Honermann via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]> <mailto:sg16_at_[hidden]> <mailto:sg16_at_[hidden]>> wrote:
>>>> 2. If the estimated width of the fill character is greater than 1, then alignment to the end of the available space might not be possible. The choice here is whether to under-fill or over-fill the available space. This possibility is avoided if fill characters are restricted to characters with an estimated width of exactly 1.
>>>> std::format("{:🤡>4}", 123);
>>>>
>>>>
>>>> Is there value in specifying it? Neither solutions are great nor terrible, i think saying unspecified would be fine, so would underfilling i guess.
>>>>
>>>> Hopefully, we are consistent and choose option 1 among those specified in the lwg issue
>>>>
>>>> For P2286R3<https://wg21.link/p2286r3> <https://wg21.link/p2286r3>, LEWG requested<https://lists.isocpp.org/sg16/2021/11/2845.php> <https://lists.isocpp.org/sg16/2021/11/2845.php> that SG16 consider the ramifications for support of user defined delimiters. We should also discuss the "?" specifier proposed to explicitly opt in to quoted and escaped formats for std::string, std::string_view, and arrays of char/wchar_t.
>>>>
>>>> Not sure the quoted thing is in our purview.
>>>>
>>>> For the delimiter, we should support codepoints, to be consistent with everything else. The issue is the we don't have experience with that afaik.
>>> But the compile-time format string parser might not necessarily understand
>>> the details of the literal encoding, so it's unclear how codepoints map to
>>> code units. Or are you saying that the rest of std::format already requires
>>> detailed understanding, anyway?
>>
>> I believe the compile-time format string parser is already
>> required to understand such details. For example, if the
>> literal encoding is Shift-JIS, then the parser would need to
>> be able to differentiate byte values that appear as lead code
>> units vs trailing code units (since, for example, a 0x5C code
>> unit denotes the '\' character if it is a lead code unit, but
>> that value may also appear as a trailing code unit for a
>> double byte character).
>>
>> I think Jens is right. MSVC does handle Shift-JIS specifically
>> but I'm not sure we can/should mandate something that work
>> universally, the burden on implementation could be high)
>
> Are you suggesting that we should revisit the consensus for the
> proposed resolution for LWG3576
> <https://cplusplus.github.io/LWG/issue3576> from our 2021-08-25
> telecon
> <https://github.com/sg16-unicode/sg16-meetings#august-25th-2021>?
>
>
>
> I am concerned about implementability
> The current resolution calls for a compile time mechanism to read a
> codepoint for arbitrary encoding.
> Such mechanism currently doesn't exist.
> For an implementation like GCC, the generic solution would be to
> expose iconv facilities through builtins (the equivalent of mblen or
> mbrtocX at least, i think, as Hubert pointed out)
> This seems... A lot to ask in an issue resolution.
> I don't remember if that was considered last time or if it constitute
> new information in anyway but we might want to bring that up again.
I believe this requirement is already the status quo. Let me provide a
better example than I did previously.
std::format("<text>");
If the literal encoding is not self-synchronizing then <text> may
contain code units that correspond to the (single) code unit for '{' but
that do not encode the '{' character. This can happen due to DBCS or
shift-state encoding. An implementation needs to be able to recognize
this case (for effected encodings) in order to avoid incorrectly
interpreting the text as containing an introducer for a replacement field.
Tom.
>
> Tom.
>
>> I agree with Corentin that delimiters should be restricted to
>> code points. That is consistent with the direction we have
>> already advocated for fill characters in LWG3576
>> <https://cplusplus.github.io/LWG/issue3576>.
>>
>> Tom.
>>
>
Received on 2021-12-03 18:04:54
