C++ Logo


Advanced search

[SG16] 2nd draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature

From: Tom Honermann <tom_at_[hidden]>
Date: Sat, 2 Jan 2021 23:15:02 -0500
Happy New Year! And what better way to start off a new year than by
discussing the utility (or lack thereof) of BOMs in UTF-8 text!

Attached is a 2nd draft of a paper intended to clarify guidance in the
Unicode standard for when a BOM should or should not be used in UTF-8
text. Discussion of the prior draft can be found in the Unicode.org
mail archives
This draft contains the following changes:

 1. An abstract was added.
 2. The Introduction section was modified as follows:
     1. A link to the email thread with initial draft feedback was added.
     2. The text was modified to highlight inconsistent interpretation
        of the existing guidance as opposed to the intent.
     3. A quote from section 2.13, "Special Characters" regarding
        Unicode signatures was added.
 3. The Proposed Resolution section was modified as follows:
     1. The section was renamed from "Possible Resolutions".
     2. The previously discussed possible changes are now presented as
        two distinct options.
     3. Proposed wording was added for the first option.
     4. The proposed wording for the second option was directed to
        section 23.8.
     5. Option 2 was modified as follows:
         1. The guidance for protocol designers was updated to avoid
            adding a BOM to ASCII text thus rendering such text non-ASCII.
         2. The guidance for text authors regarding when to use a BOM
            was expanded to cover files that may be opened by
            applications with different encoding expectations.

Thank you to everyone that shared their thoughts on the prior draft.

Assuming no substantially new feedback, I plan to submit this paper in a
week or so.


Received on 2021-01-02 22:15:11