Date: Fri, 5 Jul 2019 14:53:42 +0100
On 05/07/2019 02:37, Lyberta wrote:
> Niall Douglas:
>> Thing is, can you name a real world situation where reading the byte(s)
>> after the end of a path character range would blow up?
>
> Running under valgrind.
That would be surprising.
Most likely original path sources, in order:
1. String literals.
2. User supplied (e.g. config file, GUI box).
3. OS (e.g. getcwd()).
4. Synthesised (e.g. tempfile(), rand() into a char buffer).
5. Shared memory (i.e. IPC).
Most likely path view sources, in order:
1. Composition (e.g. append B to A, pass as view).
2. Pass-through (straight from the list above).
3. Manipulation (subset slices of previous path views).
I completely agree that reading off the end of a view is a footgun in
theory.
I disagree that it is a footgun in any likely real world use of path
views. All the likely sources of filesystem paths are zero terminated
either at the end, or at some point after the end, of the path view. I
appreciate that this is a statistical argument rather than a logical
argument. I cannot *guarantee* there will be no problem. But I can say
that I think that for 99.9999% of uses, there will be no problem.
Incidentally, LLFIO and several major commercial products which
incorporate LLFIO are regularly run under asan and valgrind. Nobody has
ever reported an issue in this design choice.
>> And besides, this is a *documentation* thing. If the API documentation
>> says "the user must guarantee that the character after is readable",
>> then violating that is on the user. We can even add it as a contract
>> precondition. I think that's okay, personally. It's in the same category
>> as vector::operator[](vector::size()). Just don't do that.
>
> I think any kind of documentation that is not enforced by the type
> system is a very bad idea.
I think contract precondition is sufficient.
>> It's at least a decade or more away. POSIX wants C to implement strings
>> properly first. C are still umming and ahhing about the best design for
>> built in string objects, and Martin Uecker is working on a formal
>> proposal for that.
>
> Getting into slight offtopic but in C I expect this to be almost perfect
> (minus some naming):
Any native C string object won't be a C struct. It'll be an integral
type, like a variable length array.
As with VLAs, sizeof() would be non-constant when fed a string object.
They would quack like a char * or char8_t *, so you could use them
wherever you might have char * or char8_t * argument.
Another design sticking point was concerns about passing these by value
into functions. I proposed that either they should get defaulted to
reference semantics, or C makes these and whatever replaces VLAs (which
are regarded as a failure likely to get deprecated) move-only objects.
I personally prefer the latter strongly. C++ choosing default copy
semantics for complex types was a mistake, but too late for us now.
Niall
> Niall Douglas:
>> Thing is, can you name a real world situation where reading the byte(s)
>> after the end of a path character range would blow up?
>
> Running under valgrind.
That would be surprising.
Most likely original path sources, in order:
1. String literals.
2. User supplied (e.g. config file, GUI box).
3. OS (e.g. getcwd()).
4. Synthesised (e.g. tempfile(), rand() into a char buffer).
5. Shared memory (i.e. IPC).
Most likely path view sources, in order:
1. Composition (e.g. append B to A, pass as view).
2. Pass-through (straight from the list above).
3. Manipulation (subset slices of previous path views).
I completely agree that reading off the end of a view is a footgun in
theory.
I disagree that it is a footgun in any likely real world use of path
views. All the likely sources of filesystem paths are zero terminated
either at the end, or at some point after the end, of the path view. I
appreciate that this is a statistical argument rather than a logical
argument. I cannot *guarantee* there will be no problem. But I can say
that I think that for 99.9999% of uses, there will be no problem.
Incidentally, LLFIO and several major commercial products which
incorporate LLFIO are regularly run under asan and valgrind. Nobody has
ever reported an issue in this design choice.
>> And besides, this is a *documentation* thing. If the API documentation
>> says "the user must guarantee that the character after is readable",
>> then violating that is on the user. We can even add it as a contract
>> precondition. I think that's okay, personally. It's in the same category
>> as vector::operator[](vector::size()). Just don't do that.
>
> I think any kind of documentation that is not enforced by the type
> system is a very bad idea.
I think contract precondition is sufficient.
>> It's at least a decade or more away. POSIX wants C to implement strings
>> properly first. C are still umming and ahhing about the best design for
>> built in string objects, and Martin Uecker is working on a formal
>> proposal for that.
>
> Getting into slight offtopic but in C I expect this to be almost perfect
> (minus some naming):
Any native C string object won't be a C struct. It'll be an integral
type, like a variable length array.
As with VLAs, sizeof() would be non-constant when fed a string object.
They would quack like a char * or char8_t *, so you could use them
wherever you might have char * or char8_t * argument.
Another design sticking point was concerns about passing these by value
into functions. I proposed that either they should get defaulted to
reference semantics, or C makes these and whatever replaces VLAs (which
are regarded as a failure likely to get deprecated) move-only objects.
I personally prefer the latter strongly. C++ choosing default copy
semantics for complex types was a mistake, but too late for us now.
Niall
Received on 2019-07-05 15:53:45