C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] SG16 approval for LEWG to review std::filesystem::path_view

From: Lyberta <lyberta_at_[hidden]>
Date: Fri, 05 Jul 2019 01:37:00 +0000
Niall Douglas:
> Thing is, can you name a real world situation where reading the byte(s)
> after the end of a path character range would blow up?

Running under valgrind.

>
> Remember, these are filesystem paths. They don't have the diversity of
> sources that a string_view would have. The chances, for example, of a
> path_view being constructed from a memory mapped region where the tail
> byte is exactly at the end of the mapped region is virtually nil. Any
> reasonably likely generation of path data is going to, at worst, have
> the character after the input be indeterminate, and not a SIGSEGV to
> read. And the standard library can legally do stuff banned in end user
> code, such as reading indeterminate bytes. This restricted kinds of
> input would not be the case for string_view, where wrapping a whole 4Kb
> page into a string_view is an eminently sensible thing to do.

Again, this assumes some explicit anti UB system. Under normal
circumstances I would assume that compiler will see UB and do some
"optimizations" that can totally blow up.

>
> And besides, this is a *documentation* thing. If the API documentation
> says "the user must guarantee that the character after is readable",
> then violating that is on the user. We can even add it as a contract
> precondition. I think that's okay, personally. It's in the same category
> as vector::operator[](vector::size()). Just don't do that.

I think any kind of documentation that is not enforced by the type
system is a very bad idea.

How hard it is to require users who want maximum performance to #ifdef
for their platform and manually NUL-terminate their path_views? The rest
99% of users will rely on current ranges semantics.

> It's at least a decade or more away. POSIX wants C to implement strings
> properly first. C are still umming and ahhing about the best design for
> built in string objects, and Martin Uecker is working on a formal
> proposal for that.

Getting into slight offtopic but in C I expect this to be almost perfect
(minus some naming):

struct utf8_string
{
 size_t size;
 char8_t* buffer;
};

struct utf16_string
{
 size_t size;
 char16_t* buffer;
};

struct utf32_string
{
 size_t size;
 char32_t* buffer;
};

Of course, that would pollute std:: namespace in C++. I hope we can get
C library relocated to somewhere like std::c. That way both C and C++
can get good names.


Received on 2019-07-05 03:38:00