Date: Thu, 5 Sep 2019 11:20:58 -0400
Thank you for writing this up, Thiago!
On 9/5/19 12:12 AM, Thiago Macieira wrote:
> == Transport ==
> P1689 suggests using JSON. I'm comparing that in the context of the three
> options with a binary format (CBOR).
>
> One thing SG16 is completely in agreement of is that if you go with JSON, you
> must obey RFC 8259: there must not be a BOM and the file must be encoded in
> UTF-8.
We haven't polled anything, so saying we're all in agreement is
premature. Additionally, we discussed this further in the SG16 meeting
yesterday and I think we determined that a BOM *may* be present.
RFC 8259 section 8.1 states: (emphasis mine)
JSON text exchanged between systems *that are not part of a closed
ecosystem* MUST be encoded using UTF-8 [RFC3629].
Previous specifications of JSON have not required the use of UTF-8
when transmitting JSON text. However, the vast majority of
JSON-based software implementations have chosen to use the UTF-8
encoding, to the extent that it is the only encoding that achieves
interoperability.
Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a *networked-transmitted JSON text*. In the interests
of interoperability, implementations that parse JSON texts *MAY
ignore the presence of a byte order mark* rather than treating it as
an error.
My reading of this is that RFC 8259 permits use of non-UTF-8 encodings
in some situations. Whether the situation that P1689 is defined for
qualifies is something that could be debated. If we consider the build
system and compiler invocations to form a closed system, then the
dependency file could be, for example, EBCDIC encoded JSON and still
conform to RFC 8259. I'm not arguing for or against such a position at
this time; but rather noting that, if SG15 requires UTF-8 encoded JSON,
that requirement is arguably more restrictive than what RFC 8259 requires.
My reading of the BOM requirements is that they only apply to UTF-8 data
sent over the network and that use of a BOM in file contents is permitted.
ECMA 404 does not specify any requirements on encoding of the JSON
content, nor the presence or absence of a BOM.
My conclusions are, if we choose to adopt either RFC 8259 or ECMA 404 as
the JSON specification deferred to, and if we don't add additional
restrictions, that:
1. Implementations could choose whatever encoding they like for the
JSON file.
2. Implementations could choose whether to produce and consume a BOM.
Tom.
On 9/5/19 12:12 AM, Thiago Macieira wrote:
> == Transport ==
> P1689 suggests using JSON. I'm comparing that in the context of the three
> options with a binary format (CBOR).
>
> One thing SG16 is completely in agreement of is that if you go with JSON, you
> must obey RFC 8259: there must not be a BOM and the file must be encoded in
> UTF-8.
We haven't polled anything, so saying we're all in agreement is
premature. Additionally, we discussed this further in the SG16 meeting
yesterday and I think we determined that a BOM *may* be present.
RFC 8259 section 8.1 states: (emphasis mine)
JSON text exchanged between systems *that are not part of a closed
ecosystem* MUST be encoded using UTF-8 [RFC3629].
Previous specifications of JSON have not required the use of UTF-8
when transmitting JSON text. However, the vast majority of
JSON-based software implementations have chosen to use the UTF-8
encoding, to the extent that it is the only encoding that achieves
interoperability.
Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a *networked-transmitted JSON text*. In the interests
of interoperability, implementations that parse JSON texts *MAY
ignore the presence of a byte order mark* rather than treating it as
an error.
My reading of this is that RFC 8259 permits use of non-UTF-8 encodings
in some situations. Whether the situation that P1689 is defined for
qualifies is something that could be debated. If we consider the build
system and compiler invocations to form a closed system, then the
dependency file could be, for example, EBCDIC encoded JSON and still
conform to RFC 8259. I'm not arguing for or against such a position at
this time; but rather noting that, if SG15 requires UTF-8 encoded JSON,
that requirement is arguably more restrictive than what RFC 8259 requires.
My reading of the BOM requirements is that they only apply to UTF-8 data
sent over the network and that use of a BOM in file contents is permitted.
ECMA 404 does not specify any requirements on encoding of the JSON
content, nor the presence or absence of a BOM.
My conclusions are, if we choose to adopt either RFC 8259 or ECMA 404 as
the JSON specification deferred to, and if we don't add additional
restrictions, that:
1. Implementations could choose whatever encoding they like for the
JSON file.
2. Implementations could choose whether to produce and consume a BOM.
Tom.
Received on 2019-09-05 17:21:01