Date: Wed, 8 Dec 2021 17:40:19 -0500
On 12/5/21 2:26 PM, Jens Maurer wrote:
> On 05/12/2021 01.04, Tom Honermann wrote:
>> On 12/4/21 6:05 AM, Jens Maurer wrote:
>>> If we impose a requirement for a code unit -> code point decoder for the
>>> literal encoding at compile-time, we should make such a facility generally
>>> available instead of hiding it in the guts of the std::format parser.
>> I think JeanHeyd's work on P1629 <https://wg21.link/p1629> will fill this niche. It would be nice if the features he proposes in N2730 <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2730.htm> were usable at compile-time as well, but that will likely have to await some kind of constexpr support in C.
> Why? We've made functions constexpr that are inherited from C
> before.
Sure, we have, and could do so again. In this case, there are behaviors
that we would have to specify that should be decided in conjunction with
WG14. For example, the N2730 "mc" and "mwc" function variants operate on
the locale dependent execution encoding. We would have to specify what
that means for compile-time evaluation. The obvious answer is, of
course, that it means the ordinary/wide literal encoding. Since that
encoding may differ from the run-time execution encoding, this
presumably means defining a locale (or at least the LC_CTYPE locale
category) for use at compile-time. We would then have to tie the
behavior to std::is_constant_evaluated() (so that the separation of
compile-time vs run-time is rigorously defined) for which there is
presently no corresponding C facility.
These are not necessarily simple functions that can be readily be
inlined or made builtins. As we've previously discussed, EBCDIC code
pages do not all consistently encode '{' and'}'. An ISO-2022 escape
mechanism that allows switching character sets presumably would require
the implementation to track shift state and have access to character set
tables in order to recognize all encodings of these characters. Though,
perhaps such an encoding is disallowed by [lex.charset]p6
<http://eel.is/c++draft/lex.charset#6>? It isn't clear to me how to
apply that wording to shift-state encodings.
Tom.
> On 05/12/2021 01.04, Tom Honermann wrote:
>> On 12/4/21 6:05 AM, Jens Maurer wrote:
>>> If we impose a requirement for a code unit -> code point decoder for the
>>> literal encoding at compile-time, we should make such a facility generally
>>> available instead of hiding it in the guts of the std::format parser.
>> I think JeanHeyd's work on P1629 <https://wg21.link/p1629> will fill this niche. It would be nice if the features he proposes in N2730 <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2730.htm> were usable at compile-time as well, but that will likely have to await some kind of constexpr support in C.
> Why? We've made functions constexpr that are inherited from C
> before.
Sure, we have, and could do so again. In this case, there are behaviors
that we would have to specify that should be decided in conjunction with
WG14. For example, the N2730 "mc" and "mwc" function variants operate on
the locale dependent execution encoding. We would have to specify what
that means for compile-time evaluation. The obvious answer is, of
course, that it means the ordinary/wide literal encoding. Since that
encoding may differ from the run-time execution encoding, this
presumably means defining a locale (or at least the LC_CTYPE locale
category) for use at compile-time. We would then have to tie the
behavior to std::is_constant_evaluated() (so that the separation of
compile-time vs run-time is rigorously defined) for which there is
presently no corresponding C facility.
These are not necessarily simple functions that can be readily be
inlined or made builtins. As we've previously discussed, EBCDIC code
pages do not all consistently encode '{' and'}'. An ISO-2022 escape
mechanism that allows switching character sets presumably would require
the implementation to track shift state and have access to character set
tables in order to recognize all encodings of these characters. Though,
perhaps such an encoding is disallowed by [lex.charset]p6
<http://eel.is/c++draft/lex.charset#6>? It isn't clear to me how to
apply that wording to shift-state encodings.
Tom.
Received on 2021-12-08 16:40:24