On Fri, Sep 6, 2019 at 2:07 PM Niall Douglas <s_sourceforge@nedprod.com> wrote:

> So let's go with UTF8, and tell tools not to spit out files that can't
> be found via UTF8. How many of the tools we currently use already have
> those limitations?

Build tooling does not control what directory it gets run from.

But developers do.

The OP's paper requires relative paths always, so one could insist on
perfect UTF8 convertibility there. But do remember that UTF8
convertibility to and from the native filesystem encoding, where the
native filesystem encoding varies between programs and programs runs,
cannot be reliably determined in advance.

I probably misunderstand - why is "in advance" necessary?

Let's say the filename is representable in UTF8 - that's the restriction I'm suggesting.

I'm going to call the JSON-or-whatever the "SG15 format", so as to avoid JSON discussion.

Let's say I'm a tool such as an IDE. I have a filename, probably read from the filesystem in whatever encoding the IDE is currently running in. (ie NOT read from the SG15 format)

If the filename was originally written in some other encoding that what I'm currently running, I probably already have the wrong filename. Nothing we can do to help with that.

So let's say I read the filename using the right filesystem encoding, and have the name in that encoding. (Or somehow managed to convert, etc)

Now I want to WRITE the SG15 format.

I have a filename, and have the filesystem encoding. Because we said the filename is representable in UTF8, I can convert, yes? (Now, maybe there is more than one conversion... normalization...)

OK, now I'm another tool, and want to read the SG15 format.

I'm running with some filesystem encoding. I have a UTF8 filename. Can I convert to filesystem encoding?

Well if my filesystem is ASCII-only, then maybe not. But we can't fix that. The user needs to know that their filesystem restricts them to ASCII only. Similarly if their filesystem is some other subset of Unicode.

But if the filesystem can represent all of UTF8 (or all of UTF8 that is in the filename) into its encoding, and you have the current filesystem encoding (maybe different encoding than the UTF8 originally converted from), you can do UTF8 -> filesystem?

At which step(s) can things go wrong?

Be seeing you,

Tony