C++ Logo

sg15

Advanced search

Re: [Tooling] [isocpp-modules] Dependency information for module-aware build tools

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 7 Mar 2019 00:15:34 -0500
On 3/5/19 3:50 PM, Richard Smith via Modules wrote:
> On Mon, Mar 4, 2019 at 2:58 PM Ben Boeckel via Modules
> <modules_at_[hidden] <mailto:modules_at_[hidden]>> wrote:
>
>
> Defined formats (I'm fine with bikeshedding these names once the
> overall
> format has been hammered out):
>
> - "raw8": interpret `data` as an array of uint8_t bytes to be passed
> to platform-specific filesystem APIs as an 8-bit encoding
> - "raw16": interpret `data` as an array of uint16_t bytes to be
> passed
> to platform-specific filesystem APIs as a 16-bit encoding
> - "raw32": interpret `data` as an array of uint32_t bytes to be
> passed
> to platform-specific filesystem APIs as a 32-bit encoding
>
>
> Are there any platforms that have APIs expecting a 32-bit encoding? I
> would expect UTF-8 (hopefully the common case), raw8 (for
> non-Windows), and raw16 (for Windows) to be the only formats we need.

I don't know of any that use 32-bit code units for file names.

I find myself thinking (as I so often do these days much to the surprise
of my past self), how does EBCDIC and z/OS fit in here? If we stick to
JSON and require the dependency file to be UTF-8 encoded, would all file
names in these files be raw8 encoded and effectively unreadable (by
humans) on z/OS? Perhaps we could allow more flexibility, but doing so
necessarily invites locales into the discussion (for those that are
unaware, EBCDIC has code pages too). For example, we could require that
the selected locale match between the producers and consumers of the
file (UB if they don't) and permit use of the string representation by
transcoding from the locale interpreted physical file name to UTF-8, but
only if reverse-transcoding produces the same physical file name,
otherwise the appropriate raw format must be used.

Tom.


Received on 2019-03-07 06:15:37