On Mon, Mar 4, 2019 at 2:58 PM Ben Boeckel via Modules <modules@lists.isocpp.org> wrote:
Defined formats (I'm fine with bikeshedding these names once the overall
format has been hammered out):
- "raw8": interpret `data` as an array of uint8_t bytes to be passed
to platform-specific filesystem APIs as an 8-bit encoding
- "raw16": interpret `data` as an array of uint16_t bytes to be passed
to platform-specific filesystem APIs as a 16-bit encoding
- "raw32": interpret `data` as an array of uint32_t bytes to be passed
to platform-specific filesystem APIs as a 32-bit encoding
Are there any platforms that have APIs expecting a 32-bit encoding? I would expect UTF-8 (hopefully the common case), raw8 (for non-Windows), and raw16 (for Windows) to be the only formats we need.
I don't know of any that use 32-bit code units for file names.
I find myself thinking (as I so often do these days much to the
surprise of my past self), how does EBCDIC and z/OS fit in here?
If we stick to JSON and require the dependency file to be UTF-8
encoded, would all file names in these files be raw8 encoded and
effectively unreadable (by humans) on z/OS? Perhaps we could
allow more flexibility, but doing so necessarily invites locales
into the discussion (for those that are unaware, EBCDIC has code
pages too). For example, we could require that the selected
locale match between the producers and consumers of the file (UB
if they don't) and permit use of the string representation by
transcoding from the locale interpreted physical file name to
UTF-8, but only if reverse-transcoding produces the same physical
file name, otherwise the appropriate raw format must be used.
Tom.