On Mon, Aug 19, 2019, 15:30 Ben Boeckel <ben.boeckel@kitware.com> wrote:
On Mon, Aug 19, 2019 at 08:16:26 +0300, Henri Sivonen wrote:
> For formats that, for legacy reasons, support multiple encodings, the
> benefit is that iƤthe BOM unambiguously signals UTF-8. For UTF-8-only
> formats, the benefit of not treating the BOM as an error is to allow
> authoring with tools designed for the kind of formats where the BOM
> actually signals UTF-8 relative to other possibilities.

The format specifies that it only accepts UTF-8. Within that context, is
it sensible to expect implementations handle a BOM? Remember that it is
mostly a format between tools and it is JSON because being able to debug
it is very useful (without mandating even more code for tools to inspect
yet another container format). These things should not be written by
hand or edited manually, so what does one gain by allowing an encoded
BOM?

Presumably the reason to use JSON instead of a custom format is to make the format consumable with JSON libraries. Therefore, it makes sense for it not to profile JSON but to work with off-the-shelf libraries. I haven't actually surveyed JSON libraries for UTF-8 BOM acceptance, but there are three reasons why UTF-8 BOM acceptance makes sense for a general-purpose JSON parsing library:

1. Compatibility with Windows-ish text editors for those JSON formats that _are_ edited with text editors.
2. Consistency with Web browsers.
3. Doing the MAY from the RFC aligns with Postel's Law (which admittedly has lost quite a bit of its charm).