Thank you for writing this up, Thiago!

On 9/5/19 12:12 AM, Thiago Macieira wrote:
== Transport ==
P1689 suggests using JSON. I'm comparing that in the context of the three 
options with a binary format (CBOR).

One thing SG16 is completely in agreement of is that if you go with JSON, you 
must obey RFC 8259: there must not be a BOM and the file must be encoded in 

We haven't polled anything, so saying we're all in agreement is premature.  Additionally, we discussed this further in the SG16 meeting yesterday and I think we determined that a BOM *may* be present.

RFC 8259 section 8.1 states: (emphasis mine)

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text.  However, the vast majority of JSON-based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text.  In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

My reading of this is that RFC 8259 permits use of non-UTF-8 encodings in some situations.  Whether the situation that P1689 is defined for qualifies is something that could be debated.  If we consider the build system and compiler invocations to form a closed system, then the dependency file could be, for example, EBCDIC encoded JSON and still conform to RFC 8259.  I'm not arguing for or against such a position at this time; but rather noting that, if SG15 requires UTF-8 encoded JSON, that requirement is arguably more restrictive than what RFC 8259 requires.

My reading of the BOM requirements is that they only apply to UTF-8 data sent over the network and that use of a BOM in file contents is permitted.

ECMA 404 does not specify any requirements on encoding of the JSON content, nor the presence or absence of a BOM.

My conclusions are, if we choose to adopt either RFC 8259 or ECMA 404 as the JSON specification deferred to, and if we don't add additional restrictions, that:

  1. Implementations could choose whatever encoding they like for the JSON file.
  2. Implementations could choose whether to produce and consume a BOM.