sg16: Re: [SG16-Unicode] BOM in JSON

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 19 Aug 2019 17:25:23 -0400

On 8/19/19 4:52 PM, Tony V E wrote:
> https://en.wikipedia.org/wiki/Byte_order_mark#Usage
>
> There is some pertinent advice on that page.

Indeed, some of which would benefit from a citation :)

Tom.

> There is also a note that Visual Studio uses/used the BOM to see if a
> file is UTF8 vs whatever else.
>
>
> On Mon, Aug 19, 2019 at 3:46 PM Ben Boeckel <ben.boeckel_at_[hidden]
> <mailto:ben.boeckel_at_[hidden]>> wrote:
>
> On Mon, Aug 19, 2019 at 22:25:05 +0300, Henri Sivonen wrote:
> > On Mon, Aug 19, 2019 at 9:57 PM Ben Boeckel
> <ben.boeckel_at_[hidden] <mailto:ben.boeckel_at_[hidden]>> wrote:
> > > Notepad?
> >
> > Yes, Notepad. It's generally easier to make parsers of all kinds
> (XML
> > before, JSON later) accept the UTF-8 BOM than to fight Notepad.
> It'll
> > take a long time for the existing installed base to get replaced
> with
> > the newest:
> https://mobile.twitter.com/JenMsft/status/1163474010509701120
>
> BOMs only make sense in an at-rest storage backed JSON file that the
> parser reads directly. Given a string, a JSON parser should
> *certainly*
> not accept a BOM leader.
>
> Quick survey:
>
> % echo $'\xEF\xBB\xBF{}' > bom.json
>
> - jsoncpp: no mention of a BOM in the source, probably unhappy about
> it
> - jq: fine
> - python3:
> json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode
> using utf-8-sig): line 1 column 1 (char 0)
> - ruby:
> /usr/share/ruby/json/common.rb:156:in `parse': 765: unexpected
> token at '\xEF\xBB\xBF{}' (JSON::ParserError)
> - C#: https://jimmybogard.com/the-curious-case-of-the-json-bom/
>
> I don't know that BOM support is actually all that wide-spread in
> readers based on this short survey. And the solution seems to be
> "don't
> write the BOM" where the problem is encountered.
>
> I think those sticking to their notepad guns are just going to have to
> wait for something better because waiting for the libraries to
> catch up
> (and the relevant fixes to be backported to declared minimum supported
> versions) is likely going to take *even longer*. Or they can
> download a
> real editor and actually contribute to whatever codebase they're
> trying
> to build.
>
> > > > 2. Consistency with Web browsers.
> > >
> > > I don't see why a web browser would care about these files.
> >
> > Maybe not _these_ JSON files, but a general-purpose JSON parser can
> > still care about consistency with Web browsers.
>
> That's fine. They can then accept the not-BOM files that every writer
> for this format would write just like every other BOM-less
> network-transferred JSON content in the world.
>
> --Ben
>
>
>
> --
> Be seeing you,
> Tony
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-08-19 23:25:26