Date: Fri, 8 Mar 2019 10:59:40 -0500
On Wed, Mar 06, 2019 at 12:13:39 -0500, Ben Boeckel wrote:
> On Mon, Mar 04, 2019 at 17:57:53 -0500, Ben Boeckel wrote:
> > Defined formats (I'm fine with bikeshedding these names once the overall
> > format has been hammered out):
> >
> > - "raw8": interpret `data` as an array of uint8_t bytes to be passed
> > to platform-specific filesystem APIs as an 8-bit encoding
> > - "raw16": interpret `data` as an array of uint16_t bytes to be passed
> > to platform-specific filesystem APIs as a 16-bit encoding
> >
> > This basically means "check if it is UTF-8, if it is, escape `\` and `"`
> > and output that, otherwise indicate the byte size of the data and write
> > it as an integer array".
Idea that came up here to improve support almost-UTF-8 and non-UTF-8
filenames:
- Filenames may contain URL escape sequences (cf RFC 1738) for
non-UTF-8 bytes. This also means that `%` must be url-encoded.
- For systems for which UTF-8 is not (generally) a valid filepath, if
the path can be converted unabiguously and losslessly *from* UTF-8
to the native API requirements, converting that path to UTF-8 is
valid (consuming tools would need to know to transcode from UTF-8
anyways). Invalid characters for this case would force a fallback to
`raw8` or `raw16` depending on the native API requirements.
* This helps Windows where there is a useful transformation between
the system API and UTF-8 except for a handful of cases.
* Also likely helps macOS where the wrong collation is present in
the source code, but can be restored for the native API.
* Additionally helps EBCDIC since most filenames there should be
able to roundtrip.
- Specify `readable` as an optional string property in the spec on
filepath for which consumers MUST NOT (RFC 2119) infer semantic
meaning (treat it like a comment).
Does this ease concerns about non-UTF-8 paths?
--Ben
> On Mon, Mar 04, 2019 at 17:57:53 -0500, Ben Boeckel wrote:
> > Defined formats (I'm fine with bikeshedding these names once the overall
> > format has been hammered out):
> >
> > - "raw8": interpret `data` as an array of uint8_t bytes to be passed
> > to platform-specific filesystem APIs as an 8-bit encoding
> > - "raw16": interpret `data` as an array of uint16_t bytes to be passed
> > to platform-specific filesystem APIs as a 16-bit encoding
> >
> > This basically means "check if it is UTF-8, if it is, escape `\` and `"`
> > and output that, otherwise indicate the byte size of the data and write
> > it as an integer array".
Idea that came up here to improve support almost-UTF-8 and non-UTF-8
filenames:
- Filenames may contain URL escape sequences (cf RFC 1738) for
non-UTF-8 bytes. This also means that `%` must be url-encoded.
- For systems for which UTF-8 is not (generally) a valid filepath, if
the path can be converted unabiguously and losslessly *from* UTF-8
to the native API requirements, converting that path to UTF-8 is
valid (consuming tools would need to know to transcode from UTF-8
anyways). Invalid characters for this case would force a fallback to
`raw8` or `raw16` depending on the native API requirements.
* This helps Windows where there is a useful transformation between
the system API and UTF-8 except for a handful of cases.
* Also likely helps macOS where the wrong collation is present in
the source code, but can be restored for the native API.
* Additionally helps EBCDIC since most filenames there should be
able to roundtrip.
- Specify `readable` as an optional string property in the spec on
filepath for which consumers MUST NOT (RFC 2119) infer semantic
meaning (treat it like a comment).
Does this ease concerns about non-UTF-8 paths?
--Ben
Received on 2019-03-08 16:59:44