C++ Logo


Advanced search

Re: [SG16-Unicode] BOM in JSON (was: Re: SG16 meeting summary for July 31st, 2019)

From: Henri Sivonen <hsivonen_at_[hidden]>
Date: Mon, 19 Aug 2019 21:36:38 +0300
On Mon, Aug 19, 2019, 15:30 Ben Boeckel <ben.boeckel_at_[hidden]> wrote:

> On Mon, Aug 19, 2019 at 08:16:26 +0300, Henri Sivonen wrote:
> > For formats that, for legacy reasons, support multiple encodings, the
> > benefit is that iƤthe BOM unambiguously signals UTF-8. For UTF-8-only
> > formats, the benefit of not treating the BOM as an error is to allow
> > authoring with tools designed for the kind of formats where the BOM
> > actually signals UTF-8 relative to other possibilities.
> The format specifies that it only accepts UTF-8. Within that context, is
> it sensible to expect implementations handle a BOM? Remember that it is
> mostly a format between tools and it is JSON because being able to debug
> it is very useful (without mandating even more code for tools to inspect
> yet another container format). These things should not be written by
> hand or edited manually, so what does one gain by allowing an encoded
> BOM?

Presumably the reason to use JSON instead of a custom format is to make the
format consumable with JSON libraries. Therefore, it makes sense for it not
to profile JSON but to work with off-the-shelf libraries. I haven't
actually surveyed JSON libraries for UTF-8 BOM acceptance, but there are
three reasons why UTF-8 BOM acceptance makes sense for a general-purpose
JSON parsing library:

1. Compatibility with Windows-ish text editors for those JSON formats that
_are_ edited with text editors.
2. Consistency with Web browsers.
3. Doing the MAY from the RFC aligns with Postel's Law (which admittedly
has lost quite a bit of its charm).


Received on 2019-08-19 20:36:54