C++ Logo

sg15

Advanced search

Re: [Tooling] Dependency information for module-aware build tools

From: Ben Boeckel <ben.boeckel_at_[hidden]>
Date: Fri, 8 Mar 2019 10:59:40 -0500
On Wed, Mar 06, 2019 at 12:13:39 -0500, Ben Boeckel wrote:
> On Mon, Mar 04, 2019 at 17:57:53 -0500, Ben Boeckel wrote:
> > Defined formats (I'm fine with bikeshedding these names once the overall
> > format has been hammered out):
> >
> > - "raw8": interpret `data` as an array of uint8_t bytes to be passed
> > to platform-specific filesystem APIs as an 8-bit encoding
> > - "raw16": interpret `data` as an array of uint16_t bytes to be passed
> > to platform-specific filesystem APIs as a 16-bit encoding
> >
> > This basically means "check if it is UTF-8, if it is, escape `\` and `"`
> > and output that, otherwise indicate the byte size of the data and write
> > it as an integer array".

Idea that came up here to improve support almost-UTF-8 and non-UTF-8
filenames:

  - Filenames may contain URL escape sequences (cf RFC 1738) for
    non-UTF-8 bytes. This also means that `%` must be url-encoded.
  - For systems for which UTF-8 is not (generally) a valid filepath, if
    the path can be converted unabiguously and losslessly *from* UTF-8
    to the native API requirements, converting that path to UTF-8 is
    valid (consuming tools would need to know to transcode from UTF-8
    anyways). Invalid characters for this case would force a fallback to
    `raw8` or `raw16` depending on the native API requirements.
    * This helps Windows where there is a useful transformation between
      the system API and UTF-8 except for a handful of cases.
    * Also likely helps macOS where the wrong collation is present in
      the source code, but can be restored for the native API.
    * Additionally helps EBCDIC since most filenames there should be
      able to roundtrip.
  - Specify `readable` as an optional string property in the spec on
    filepath for which consumers MUST NOT (RFC 2119) infer semantic
    meaning (treat it like a comment).

Does this ease concerns about non-UTF-8 paths?

--Ben

Received on 2019-03-08 16:59:44