sg16: Re: [SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature

From: Thiago Macieira <thiago_at_[hidden]>
Date: Fri, 16 Oct 2020 13:39:39 -0700

On Friday, 16 October 2020 11:58:16 PDT Jens Maurer wrote:
> > The status quo has remained because there has been nothing forcing a
> > change to status quo. Yes, there's a lot of old codebase that, for
> > example, might have comments written in Chinese or Finnish or something
> > else. But nothing has forced those to update. If the critical mass of
> > software is UTF-8, that will force those codebases to recode. And unlike
> > Microsoft's fixing of their own SDK header files to comply with the
> > language, this is a simple recode operation. It can be done by downstream
> > users, with little to no danger.
> Such a recode might be easier for some and harder for others,
> depending on which older versions of compilers need to be
> supported or other environmental factors, possibly beyond the
> immediate control of the developer or project.
>
> I don't believe we have sufficient insight into C++ code at large
> in WG21, let alone SG16, so let's be careful with statements assuming
> that people will be "forced" to do something.

"Old versions of compilers" cannot apply here, since anything we do that
requires any type of marker implies newer versions of the compilers. That will
actually take longer, since those solutions don't exist yet and would take at
least a year to become available to early adopters, much less to the entire
community at large.

Contrast that to the "all source is UTF-8" solution, which is already deployed
for the most part. MSVC added support for it with version 2015 update 2, which
is not very new. Deployments that haven't been able to get compilers with
support for UTF-8 are usually also those most conservative in terms of
updating third-party dependencies, so they aren't likely to suffer until their
next major upgrade anyway, at which point upgrading the compiler might be
acceptable.

In any case, my experience so far is that library developers are already
moving ahead of any recommendation from us. And that solution is "it's UTF-8,
no marker, deal with it." I fear that our making any recommendation that is
different from this will have either no effect or, worse, cause fracture in
the community. I'd much rather we stayed silent instead and let the community
make the decision.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel DPG Cloud Engineering

Received on 2020-10-16 15:39:45