Thank you for writing this up, Thiago!

On 9/5/19 12:12 AM, Thiago Macieira wrote:
== Transport ==
P1689 suggests using JSON. I'm comparing that in the context of the three 
options with a binary format (CBOR).

One thing SG16 is completely in agreement of is that if you go with JSON, you 
must obey RFC 8259: there must not be a BOM and the file must be encoded in 
UTF-8.

We haven't polled anything, so saying we're all in agreement is premature.  Additionally, we discussed this further in the SG16 meeting yesterday and I think we determined that a BOM *may* be present.

RFC 8259 section 8.1 states: (emphasis mine)

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text.  However, the vast majority of JSON-based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text.  In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

My reading of this is that RFC 8259 permits use of non-UTF-8 encodings in some situations.  Whether the situation that P1689 is defined for qualifies is something that could be debated.  If we consider the build system and compiler invocations to form a closed system, then the dependency file could be, for example, EBCDIC encoded JSON and still conform to RFC 8259.  I'm not arguing for or against such a position at this time; but rather noting that, if SG15 requires UTF-8 encoded JSON, that requirement is arguably more restrictive than what RFC 8259 requires.

My reading of the BOM requirements is that they only apply to UTF-8 data sent over the network and that use of a BOM in file contents is permitted.

ECMA 404 does not specify any requirements on encoding of the JSON content, nor the presence or absence of a BOM.

My conclusions are, if we choose to adopt either RFC 8259 or ECMA 404 as the JSON specification deferred to, and if we don't add additional restrictions, that:

  1. Implementations could choose whatever encoding they like for the JSON file.
  2. Implementations could choose whether to produce and consume a BOM.

Tom.