Subject: Re: Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature
From: Tom Honermann (tom_at_[hidden])
Date: 2020-10-11 22:37:04
On 10/11/20 11:32 PM, JF Bastien wrote:
> It's a bit odd: if you assume the default is ascii then you don't need
> this. If you assume the default is utf8 then you don't need this... so
> when do you need the BOM? It seems like making bad prior choices more
> acceptable... even though they were bad choices. I'm not sure it's a
> good idea.
A BOM would be needed when:
1. The default encoding is ASCII based (ISO-8859-1, Windows-1252,
etc...) and the UTF-8 text to be produced contains non-ASCII
2. The default encoding is not ASCII based (e.g., EBCDIC).
Both of these cases presume that the default encoding can't be made
UTF-8 for backward compatibility reasons.
> On Sun, Oct 11, 2020 at 8:22 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> On 10/10/20 7:58 PM, Alisdair Meredith via SG16 wrote:
>> One concern I have, that might lead into rationale for the
>> current discouragement,
>> is that I would hate to see a best practice that pushes a BOM
>> into ASCII files.
>> One of the nice properties of UTF-8 is that a valid ASCII file
>> (still very common) is
>> also a valid UTF-8 file.Â Changing best practice would encourage
>> updating those
>> files to be no longer ASCII.
> Thanks, Alisdair.Â I think that concern is implicitly addressed by
> the suggested resolutions, but perhaps that can be made more
> clear.Â One possibility would be to modify the "protocol designer"
> guidelines to address the case where a protocol's default encoding
> is ASCII based and to specify that a BOM is only required for
> UTF-8 text that contains non-ASCII characters.Â Would that be helpful?
>>> On Oct 10, 2020, at 14:54, Tom Honermann via SG16
>>> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>>> Attached is a draft proposal for the Unicode standard that
>>> intends to clarify the current recommendation regarding use of a
>>> BOM in UTF-8 text.Â This is follow up to discussion on the
>>> Unicode mailing list
>>> back in June.
>>> Feedback is welcome.Â I plan to submit
>>> <https://www.unicode.org/pending/docsubmit.html> this to the UTC
>>> in a week or so pending review feedback.
>>> SG16 mailing list
>>> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
SG16 list run by email@example.com