C++ Logo

SG16

Advanced search

Subject: Re: Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature
From: Tom Honermann (tom_at_[hidden])
Date: 2020-10-11 22:37:04


On 10/11/20 11:32 PM, JF Bastien wrote:
> It's a bit odd: if you assume the default is ascii then you don't need
> this. If you assume the default is utf8 then you don't need this... so
> when do you need the BOM? It seems like making bad prior choices more
> acceptable... even though they were bad choices. I'm not sure it's a
> good idea.

A BOM would be needed when:

 1. The default encoding is ASCII based (ISO-8859-1, Windows-1252,
    etc...) and the UTF-8 text to be produced contains non-ASCII
    characters.  Or,
 2. The default encoding is not ASCII based (e.g., EBCDIC).

Both of these cases presume that the default encoding can't be made
UTF-8 for backward compatibility reasons.

Tom.

>
> On Sun, Oct 11, 2020 at 8:22 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> On 10/10/20 7:58 PM, Alisdair Meredith via SG16 wrote:
>> One concern I have, that might lead into rationale for the
>> current discouragement,
>> is that I would hate to see a best practice that pushes a BOM
>> into ASCII files.
>> One of the nice properties of UTF-8 is that a valid ASCII file
>> (still very common) is
>> also a valid UTF-8 file.  Changing best practice would encourage
>> updating those
>> files to be no longer ASCII.
>
> Thanks, Alisdair.  I think that concern is implicitly addressed by
> the suggested resolutions, but perhaps that can be made more
> clear.  One possibility would be to modify the "protocol designer"
> guidelines to address the case where a protocol's default encoding
> is ASCII based and to specify that a BOM is only required for
> UTF-8 text that contains non-ASCII characters.  Would that be helpful?
>
>
> Tom.
>
>>
>> AlisdairM
>>
>>> On Oct 10, 2020, at 14:54, Tom Honermann via SG16
>>> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>>>
>>> Attached is a draft proposal for the Unicode standard that
>>> intends to clarify the current recommendation regarding use of a
>>> BOM in UTF-8 text.  This is follow up to discussion on the
>>> Unicode mailing list
>>> <https://corp.unicode.org/pipermail/unicode/2020-June/008713.html>
>>> back in June.
>>>
>>> Feedback is welcome.  I plan to submit
>>> <https://www.unicode.org/pending/docsubmit.html> this to the UTC
>>> in a week or so pending review feedback.
>>>
>>> Tom.
>>>
>>> <Unicode-BOM-guidance.pdf>--
>>> SG16 mailing list
>>> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>



SG16 list run by sg16-owner@lists.isocpp.org