sg16: Re: [SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 16 Oct 2020 16:59:15 -0400

On 10/16/20 4:39 PM, Thiago Macieira via SG16 wrote:
> On Friday, 16 October 2020 11:58:16 PDT Jens Maurer wrote:
>>> The status quo has remained because there has been nothing forcing a
>>> change to status quo. Yes, there's a lot of old codebase that, for
>>> example, might have comments written in Chinese or Finnish or something
>>> else. But nothing has forced those to update. If the critical mass of
>>> software is UTF-8, that will force those codebases to recode. And unlike
>>> Microsoft's fixing of their own SDK header files to comply with the
>>> language, this is a simple recode operation. It can be done by downstream
>>> users, with little to no danger.
>> Such a recode might be easier for some and harder for others,
>> depending on which older versions of compilers need to be
>> supported or other environmental factors, possibly beyond the
>> immediate control of the developer or project.
>>
>> I don't believe we have sufficient insight into C++ code at large
>> in WG21, let alone SG16, so let's be careful with statements assuming
>> that people will be "forced" to do something.
> "Old versions of compilers" cannot apply here, since anything we do that
> requires any type of marker implies newer versions of the compilers. That will
> actually take longer, since those solutions don't exist yet and would take at
> least a year to become available to early adopters, much less to the entire
> community at large.
>
> Contrast that to the "all source is UTF-8" solution, which is already deployed
> for the most part. MSVC added support for it with version 2015 update 2, which
> is not very new. Deployments that haven't been able to get compilers with
> support for UTF-8 are usually also those most conservative in terms of
> updating third-party dependencies, so they aren't likely to suffer until their
> next major upgrade anyway, at which point upgrading the compiler might be
> acceptable.
>
> In any case, my experience so far is that library developers are already
> moving ahead of any recommendation from us. And that solution is "it's UTF-8,
> no marker, deal with it." I fear that our making any recommendation that is
> different from this will have either no effect or, worse, cause fracture in
> the community. I'd much rather we stayed silent instead and let the community
> make the decision.
>
I strongly agree that any recommendation that is not aligned with that
approach is doomed to fail. I'd like to enable a solution that supports
a best practice in which files that are *not* UTF-8 are marked in some
way while acknowledging and supporting exceptional circumstances in
which it makes sense to mark UTF-8 files instead, hopefully as a short
term solution.

Tom.

Received on 2020-10-16 15:59:24