C++ Logo


Advanced search

Re: [SG16-Unicode] P1689: Encoding of filenames for interchange

From: Tony V E <tvaneerd_at_[hidden]>
Date: Fri, 6 Sep 2019 14:41:06 -0400
On Fri, Sep 6, 2019 at 2:07 PM Niall Douglas <s_sourceforge_at_[hidden]>

> > So let's go with UTF8, and tell tools not to spit out files that can't
> > be found via UTF8. How many of the tools we currently use already have
> > those limitations?
> Build tooling does not control what directory it gets run from.

But developers do.

> The OP's paper requires relative paths always, so one could insist on
> perfect UTF8 convertibility there. But do remember that UTF8
> convertibility to and from the native filesystem encoding, where the
> native filesystem encoding varies between programs and programs runs,
> cannot be reliably determined in advance.

I probably misunderstand - why is "in advance" necessary?

Let's say the filename is representable in UTF8 - that's the restriction
I'm suggesting.

I'm going to call the JSON-or-whatever the "SG15 format", so as to avoid
JSON discussion.

Let's say I'm a tool such as an IDE. I have a filename, probably read from
the filesystem in whatever encoding the IDE is currently running in. (ie
NOT read from the SG15 format)
If the filename was originally written in some other encoding that what I'm
currently running, I probably already have the wrong filename. Nothing we
can do to help with that.

So let's say I read the filename using the right filesystem encoding, and
have the name in that encoding. (Or somehow managed to convert, etc)
Now I want to WRITE the SG15 format.
I have a filename, and have the filesystem encoding. Because we said the
filename is representable in UTF8, I can convert, yes? (Now, maybe there
is more than one conversion... normalization...)

OK, now I'm another tool, and want to read the SG15 format.

I'm running with some filesystem encoding. I have a UTF8 filename. Can I
convert to filesystem encoding?

Well if my filesystem is ASCII-only, then maybe not. But we can't fix
that. The user needs to know that their filesystem restricts them to ASCII
only. Similarly if their filesystem is some other subset of Unicode.

But if the filesystem can represent all of UTF8 (or all of UTF8 that is in
the filename) into its encoding, and you have the current filesystem
encoding (maybe different encoding than the UTF8 originally converted
from), you can do UTF8 -> filesystem?

At which step(s) can things go wrong?

Be seeing you,

Received on 2019-09-06 20:41:24