On Wed, Jul 3, 2019, 17:11 Niall Douglas <s_sourceforge@nedprod.com> wrote:

To explain the dependency, path_view really wants to avoid ever copying
memory.

The paper shows

bool _buffer_needs_freeing;

I find it surprising that a type name foo_view is more like a copy-on-write type than a pure view.

means we cannot use the same API as filesystem::path. So,
if on Windows you write:

path_view(u16"c:/some/path");

... then the literal string gets passed directly through to the wide
character Windows syscalls, uncopied. If on POSIX, it would undergo a
just-in-time downconversion to UTF-8.

If whether a copy happens is platform-dependent, it’s pretty unfortunate that this paper and the file system API don’t adopt WTF-8 and putting the conversion on the Windows side. See below.

This works by path_view inspecting the char literal passed to it for its
type. It refuses to accept `char`, it will only accept byte [1],
char8_t, char16_t and char32_t. In other words, you must explicitly
specify the UTF encoding for path view input.

Is SG16 happy for LEWG to review P1030?

Niall

[1]: Raw bytes are accepted because most filesystems will actually
accept raw binary bytes (minus a few reserved values such as '/' or 0)
as path identifiers without issue. There is no interpretation done based
on Unicode. Thus, not supporting raw bytes makes many valid filenames
impossible to work with.

To evaluate this, it would be important to state what the semantics for bytes are on Windows. Interpreting them according to the “ANSI” code page of the process would be traditional but does not allow addressing all files and goes directly against the motivation stated.

I encourage the committee to look at supporting WTF-8 (https://simonsapin.github.io/wtf-8/) as an 8-bit-code-unit encoding that

1) Allows addressing all NT file paths

2) Is equivalent to UTF-8 for those NT file paths that have a textual interpretation.

This allows for portable code (where the platform-dependent conversion goes on the Windows side unlike in the example above). On POSIX-like platforms, you’d instead use a sequence of bytes to address all file paths and if the path happens to be valid UTF-8, the path has a textual interpretation. To enable even more portable application code, you’d make the file system library swap \ and / on Windows similar to how : and / are swapped to expose HFS+ in a POSIX-compatible way.

(The Rust file system API uses WTF-8 on Windows enabling portable application layer code that can address all file paths in both Windows and POSIX with cheap conversion to/from UTF-8 strings.)