sg16: Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: Corentin <corentin.jabot_at_[hidden]>
Date: Mon, 9 Sep 2019 18:26:48 +0200

On Mon, 9 Sep 2019 at 16:31, Tony V E <tvaneerd_at_[hidden]> wrote:

>
>
> On Mon, Sep 9, 2019 at 3:31 AM Corentin <corentin.jabot_at_[hidden]> wrote:
>
>>
>>
>> On Mon, 9 Sep 2019 at 01:25, Tom Honermann <tom_at_[hidden]> wrote:
>>
>>>
>>> On Sep 8, 2019, at 3:31 PM, Tony V E via Lib <lib_at_[hidden]>
>>> wrote:
>>>
>>> Do we have / could we have / should we have
>>> a clear long term (20 years) direction for text in C++?
>>>
>>>
>>> I would like that very much, but we don’t control the ecosystem, and
>>> will have to, to some degree, roll with where the community takes us.
>>>
>>
>> The community is waiting for us to catch up and i do believe we have some
>> control
>>
>
> yep, every other language just decided for the community.
>
> As C++, we have to allow the user to do _anything_, but they already can.
> And they will still be able to.
>
>
>
>>
>>>
>>>
>>> ie the long term direction is unicode.
>>> and/or specifically the long term direction is UTF8.
>>>
>>>
>>> I think we do have wide spread agreement on that, though UTF-16 is
>>> likely to remain strongly relevant in some niches.
>>>
>>> We expect everyone to use char8_t then? Or we expect char to become
>>> utf8 someday?
>>>
>>>
>>> I think it is very unlikely that there will be a mass migration to
>>> char8_t. My expectation is that it will be used for the internal encoding
>>> within some percentage of new projects and components.
>>>
>>> With regard to char, I expect it to remain the type used for text that
>>> may or may not be UTF-8.
>>>
>>> I think Microsoft will eventually provide (non-experimental) means to
>>> use UTF-8 with Win32 and that this will likely come in three forms
>>>
>>
>>> 1) support for UTF-8 as the system wide Active Code Page (ACP). This is
>>> already available as an experimental option.
>>>
>>
>> They di
>>
>>
>>>
>>> 2) support for executables to opt-in to a per-process override of the
>>> system wide ACP. In this mode, stdio would presumably traffic in the system
>>> wide ACP and require transcoding (I don’t think implicit transcoding is
>>> realistic). This is already available as an experimental option.
>>>
>>
>>
>> They do
>>
>
> How does "override system wide ACP" and "stdio traffic in system wide ACP"
> fit together? Either my process thinks the world is on the UTF8 ACP, or it
> doesn't. I would expect transcoding or whatever else is required. I would
> expect fopen to work, etc.
>

everything within the programs assumes ACP, if you are trying to say output
to the console, and the ACP is utf8, the console has to expect utf8

>
> If that works, I believe almost every Windows developer will turn this on,
> and char will be utf8 (as it is on linux, IIUC).
> Most code will "just work".
>
> In 10 years, it will be the assumption.
>
> I think we sure steer in the direction that char becomes UTF8.
>
> In the short term we could say char is whatever the system is in, but we
> encourage UTF8. Or something like that. Maybe the standard "assumes"
> UTF8, but implementations are allowed to vary. Whatever "assumes" means
> for a given API.
> We could define things like fmt to be "if the system is UTF8, then
> behaviour is X, otherwise YMMV (ie implementation defined)".
>
>
> --
> Be seeing you,
> Tony
>

Received on 2019-09-09 18:27:01