sg16: Re: [SG16-Unicode] BOM in JSON (was: Re: SG16 meeting summary for July 31st, 2019)

From: Ben Boeckel <ben.boeckel_at_[hidden]>
Date: Mon, 19 Aug 2019 15:46:08 -0400

On Mon, Aug 19, 2019 at 22:25:05 +0300, Henri Sivonen wrote:
> On Mon, Aug 19, 2019 at 9:57 PM Ben Boeckel <ben.boeckel_at_[hidden]> wrote:
> > Notepad?
>
> Yes, Notepad. It's generally easier to make parsers of all kinds (XML
> before, JSON later) accept the UTF-8 BOM than to fight Notepad. It'll
> take a long time for the existing installed base to get replaced with
> the newest: https://mobile.twitter.com/JenMsft/status/1163474010509701120

BOMs only make sense in an at-rest storage backed JSON file that the
parser reads directly. Given a string, a JSON parser should *certainly*
not accept a BOM leader.

Quick survey:

    % echo $'\xEF\xBB\xBF{}' > bom.json

  - jsoncpp: no mention of a BOM in the source, probably unhappy about
    it
  - jq: fine
  - python3:
    json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
  - ruby:
    /usr/share/ruby/json/common.rb:156:in `parse': 765: unexpected token at '\xEF\xBB\xBF{}' (JSON::ParserError)
  - C#: https://jimmybogard.com/the-curious-case-of-the-json-bom/

I don't know that BOM support is actually all that wide-spread in
readers based on this short survey. And the solution seems to be "don't
write the BOM" where the problem is encountered.

I think those sticking to their notepad guns are just going to have to
wait for something better because waiting for the libraries to catch up
(and the relevant fixes to be backported to declared minimum supported
versions) is likely going to take *even longer*. Or they can download a
real editor and actually contribute to whatever codebase they're trying
to build.

> > > 2. Consistency with Web browsers.
> >
> > I don't see why a web browser would care about these files.
>
> Maybe not _these_ JSON files, but a general-purpose JSON parser can
> still care about consistency with Web browsers.

That's fine. They can then accept the not-BOM files that every writer
for this format would write just like every other BOM-less
network-transferred JSON content in the world.

--Ben

Received on 2019-08-19 21:46:10