On Mon, Sep 9, 2019 at 3:31 AM Corentin <corentin.jabot@gmail.com> wrote:

On Mon, 9 Sep 2019 at 01:25, Tom Honermann <tom@honermann.net> wrote:

On Sep 8, 2019, at 3:31 PM, Tony V E via Lib <lib@lists.isocpp.org> wrote:

Do we have / could we have / should we have

a clear long term (20 years) direction for text in C++?

I would like that very much, but we don’t control the ecosystem, and will have to, to some degree, roll with where the community takes us.

The community is waiting for us to catch up and i do believe we have some control

yep, every other language just decided for the community.

As C++, we have to allow the user to do _anything_, but they already can. And they will still be able to.

ie the long term direction is unicode.

and/or specifically the long term direction is UTF8.

I think we do have wide spread agreement on that, though UTF-16 is likely to remain strongly relevant in some niches.

We expect everyone to use char8_t then? Or we expect char to become utf8 someday?

I think it is very unlikely that there will be a mass migration to char8_t. My expectation is that it will be used for the internal encoding within some percentage of new projects and components.

With regard to char, I expect it to remain the type used for text that may or may not be UTF-8.

I think Microsoft will eventually provide (non-experimental) means to use UTF-8 with Win32 and that this will likely come in three forms

1) support for UTF-8 as the system wide Active Code Page (ACP). This is already available as an experimental option.

They di

2) support for executables to opt-in to a per-process override of the system wide ACP. In this mode, stdio would presumably traffic in the system wide ACP and require transcoding (I don’t think implicit transcoding is realistic). This is already available as an experimental option.

They do

How does "override system wide ACP" and "stdio traffic in system wide ACP" fit together? Either my process thinks the world is on the UTF8 ACP, or it doesn't. I would expect transcoding or whatever else is required. I would expect fopen to work, etc.

If that works, I believe almost every Windows developer will turn this on, and char will be utf8 (as it is on linux, IIUC).

Most code will "just work".

In 10 years, it will be the assumption.

I think we sure steer in the direction that char becomes UTF8.

In the short term we could say char is whatever the system is in, but we encourage UTF8. Or something like that. Maybe the standard "assumes" UTF8, but implementations are allowed to vary. Whatever "assumes" means for a given API.

We could define things like fmt to be "if the system is UTF8, then behaviour is X, otherwise YMMV (ie implementation defined)".

Be seeing you,

Tony