On 3/5/19 3:50 PM, Richard Smith via Modules wrote:

On Mon, Mar 4, 2019 at 2:58 PM Ben Boeckel via Modules <modules@lists.isocpp.org> wrote:

Defined formats (I'm fine with bikeshedding these names once the overall
format has been hammered out):

- "raw8": interpret `data` as an array of uint8_t bytes to be passed
to platform-specific filesystem APIs as an 8-bit encoding
- "raw16": interpret `data` as an array of uint16_t bytes to be passed
to platform-specific filesystem APIs as a 16-bit encoding
- "raw32": interpret `data` as an array of uint32_t bytes to be passed
to platform-specific filesystem APIs as a 32-bit encoding

Are there any platforms that have APIs expecting a 32-bit encoding? I would expect UTF-8 (hopefully the common case), raw8 (for non-Windows), and raw16 (for Windows) to be the only formats we need.

I don't know of any that use 32-bit code units for file names.

I find myself thinking (as I so often do these days much to the surprise of my past self), how does EBCDIC and z/OS fit in here? If we stick to JSON and require the dependency file to be UTF-8 encoded, would all file names in these files be raw8 encoded and effectively unreadable (by humans) on z/OS? Perhaps we could allow more flexibility, but doing so necessarily invites locales into the discussion (for those that are unaware, EBCDIC has code pages too). For example, we could require that the selected locale match between the producers and consumers of the file (UB if they don't) and permit use of the string representation by transcoding from the locale interpreted physical file name to UTF-8, but only if reverse-transcoding produces the same physical file name, otherwise the appropriate raw format must be used.

Tom.