C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] SG16 approval for LEWG to review std::filesystem::path_view

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Wed, 3 Jul 2019 21:43:28 +0100
On 03/07/2019 19:17, JeanHeyd Meneide wrote:
> Dear Niall Douglas,
>
> FYI I was completely unaware that SG16 had discussed P1030. Nobody
> told me.
>
>
> Apologies, that was my failing: I think I had only briefly
> mentioned it during my first meeting Standards Meeting after I first met
> you in Rapperswil, Switzerland, but I likely should have delivered the
> feedback VIA e-mail rather than a brief gloss over in person in the most
> timid of fashions.

This rings a bell actually. Was it when we were discussing
implementation of std::embed? The problem with WG21 meetings is that
they are too intense, easy memory recall is always a problem with those.

> I should add that the killing off of char input was strongly requested
> by Billy. I got the feeling it was a red line for him. I can understand
> why, from a MSVC-implementer perspective, and I have witnessed first
> hand the brokenness of char input to path on Windows.
>
>
> I am very glad that `char` is not included here. My only potential
> concern is that Linux-exclusive users will cry out. But then again, so
> will MSVC users with L"" strings. "Everyone suffers equally" is a bit of
> a cold comfort, though, but it's completely understandable why.

For me personally, if it's a "fixing lots of unintentionally broken
code" kind of source incompatibility when someone ups the C++ standard
version in the compiler, I find that acceptable. Source breakage which
can be easily repaired with find and replace regex in files is in my
opinion very acceptable indeed.

> The aim is for path_view to be usually no worse than path, nothing more.
> If the input is in UTF-8, and the system API requires UTF-16, then you
> need to convert, same as for path. Unless you want to push mandatory
> #ifdef-ing onto the end user, which I don't think we want.
>
>
> Right, I think what was lost in the original example upon Tom's
> reading was that it _always_ converted. That's not the case for
> path_view: it will only convert if the passed-in encoding does not match
> the native file system's encoding, and only do such a conversion when
> necessary. If the user passes in UTF8 on POSIX, no converting will be done.

Correct.

There is one other situation where Unicode conversion may occur - it is
when two path views of differing character backing are compared. In this
situation, one of the path views gets converted in order to perform a
string comparison. My personal preference is that both get converted to
whatever is native on the platform if necessary, but I can see people
objecting to the potential inefficiency of that.

> Finally, I have one more... thing? It's not really a concern or a
> nit with the paper, just a bit of sadness: had we standardized a
> c_string_view of some form, we wouldn't need what will probably amount
> of "/Expects: /ptr[size] == 0 is true" on all the specifications on the
> constructors for the string view / charX + pointer overloads. I agree
> with the reasoning in the paper that there is rarely a case where users
> expect that to be the case, but it's not exactly impossible: I have
> received a lovely crop of bug reports from people seeing
> "std::string_view" overloads in my libraries and passing
> non-null-terminated strings into them, because that's what string_view
> promises. Whether or not your wording has an "expects" clause, it's not
> difficult or hard to imagine substring or other similar pointer + size
> manipulations to produce hell here.
>
> Then again, this is marked as the /path/_view type. Maybe that
> will be enough visual indication to the user they should think carefully
> and not just toss in random substrings. I certainly hope it is.

I've not had any reports of problems from LLFIO users, most of whom are
no wizards in C++. Indeed they often haven't realised that paths weren't
being copied until I told them. For them, it all "just worked" silently
without being noticed, or rather, I get feedback along the lines of "OMG
I ported my native API based code to LLFIO and it's now 40% faster! How
is that possible?". When we dig into why, it turns out they were
previously using std::string for path manipulation, and that was gating
open file performance significantly, because opening a file can be
really fast on Linux, so a malloc + memcpy + free cycle actually matters
a lot.

Thanks for the feedback, and sorry for not remembering our conversation
at Rapperswil better. I'm getting old :(

Niall

Received on 2019-07-03 22:43:32