Date: Wed, 5 Jun 2024 14:52:19 +0100
On 05/06/2024 13:44, Frederick Virchanza Gotham via Std-Proposals wrote:
> On Wed, Jun 5, 2024 at 12:02 PM Lénárd Szolnoki wrote:
>>
>> No, that's not what I am saying. In your paper optional::emplace is
>> implemented the following way:
>>
>> template<typename T>
>> T &optional<T>::emplace(T &&&arg)
>> {
>> this->reset();
>> T &retval = *::new(this->buffer) T( arg() );
>> this->has_value_bool = true;
>> return retval;
>> }
>>
>> This is what happens when called with the "deferred" or "lazy" argument
>> `std::move(*opt_str) + " World!`.
>
>
> The expression:
>
> std::move(*opt_str) + " World!"
>
> is a function call as follows:
>
> operator+( std::move(*opt_str), " World!" );
>
> and it returns a PRvalue of type 'std::string'.
>
> As I wrote in my paper under the header "x86_64 Linux implementation",
> the compiler must emit a helper function that moves the arguments from
> memory to registers, as follows:
>
> Helper_Function: ; address of return slot is in RDI
> mov rax, rsi ; address of arguments in memory
> mov rsi, [rax+0] ; move address of "*opt_str" from
> memory to register
> mov rcx, [rax+8] ; move address of " World!" from
> memory to register
> jmp operator+(string&&,char*)
>
> More on this later. . .
>
>
>> 1. this->reset()
>> After this the optional doesn't hold a value anymore.
>>
>> 2. T& retval = *::new(this->buffer) T( arg() );
>> Here `arg()` evaluates `std::move(*opt_str) + " World!`. But `opt_str`
>> has no value anymore, therefore `*opt_str` is UB.
>
>
> The arguments to the-function-which-returns-the-PRvalue get evaluated
> before the 'emplace' method is invoked.
You are right, this is something I missed in the proposal. To be honest,
I find this a rather weird design choice. Admittedly this can still
mitigate the issue in some particular cases, but as you also found out
down below, not here.
Anyway, even if it did solve the issue, if for some reason you needed a
more complicated expression than `std::move(*opt_str) + " World!"` and
then decide to extract that into a function that takes a reference to an
optional then you are back to the original issue again, like so:
opt_str.emplace(append_world(opt_str));
Only lazily evaluating the "top-most" expression feels like a weird
place to draw the line and results in a quirk that can break extracting
expressions into functions.
> That is to say, the arguments
> to the "operator+" function get evaluated before the 'emplace' method
> is invoked. Looking at your original 'foo' function again:
>
> void foo(std::optional<std::string>& opt_str)
> {
> if ( opt_str.has_value() )
> {
> opt_str.emplace( std::move(*opt_str) + " World!" );
> }
> }
>
> After PRvalue Parameters have been added to the language, the x86_64
> assembler on Linux for 'foo' will be something like as follows. (Note
> on Linux that the first 3 arguments go in RDI, RSI, RCX, and the
> return value goes in RAX).
>
> Foo: ; address of opt_str is in RDI
> push rdi ; save address of opt_str for later
> call optional<string>::has_value ; returned boolean goes in RAX
> test rax, rax ; check if has_value returned false
> jnz has_value_is_true ; if nonzero jump to has_value_is_true
> pop rdi ; just to restore stack pointer
> ret ; return from function
> has_value_is_true:
> mov rdi, [rsp] ; set 1st param = address of opt_str
> call optional<string>::operator* ; return value goes in RAX
> pop rdi ; address of opt_str in 1st argument
> position again
> push "World!" ; 2nd argument wrapped to helper function is "World!"
> push rax ; 1st argument wrapped to helper function
> is address of *opt_str
> mov rsi, Helper_Function ; set 2nd argument to address of
> Helper_Function
> mov rcx, rsp ; set 3rd argument to current stack pointer
> call optional<string>::emplace(string&&&)
> sub rsp, 16 ; just to restore the stack
> ret
>
> Okay . . . having written this out . . . . I can see now where the problem is:
> The address of '*opt_str' is passed to the Helper_Function, and it
> is the 'emplace' method that will invoke the Helper_Function, but the
> 'emplace' method will invoke "this->reset()" before invoking
> Helper_Function, meaning that Helper_Function will be given the
> address of a string that has already been destroyed. I wonder if
> there's a remedy to this. Specifically the problem is as follows:
> (1) If we destroy the string before calling the helper, then we
> get a segfault
> (2) If we don't destroy the string before calling the helper, then
> we get a memory leak
>
> I don't think that No. 1 can be remedied at all -- we can't query an
> object after its memory has been deallocated. But I think that No. 2
> can be remedied if we relocate the object in order to destroy it later
> at a different location. So the chain of events inside the 'emplace'
> method would be:
> (1) Relocate "*opt_str" to another memory address
> (2) Invoke the helper function (thus overwriting the string in-place)
> (3) Destroy the relocated string
>
> So then the implementation of 'emplace(string&&&)' would become something like:
>
> template<typename T>
> T &optional<T>::emplace(T &&&arg)
> {
> if ( false == this->has_value() )
> {
> T &retval = *::new(this->buffer) T( arg() );
> this->has_value_bool = true;
> return retval;
> }
>
> this->has_value_bool = false;
>
> alignas(string) char unsigned reloc_buf[ sizeof(string) ];
> std::relocate( (string*)reloc_buf, &this->value() );
> Auto( ((string*)reloc_buf)->~string() );
>
> T &retval = *::new(this->buffer) T( arg() );
> this->has_value_bool = true;
>
> return retval;
> }
>
> Could something like that work?
No, `arg()` doesn't know anything about your relocated string. This is
would also be only a partial remedy, as I can have an example where the
deferred function call evaluates `opt_str.has_value()`, which has a
different return value with the function call not being deferred.
Cheers,
Lénárd
> On Wed, Jun 5, 2024 at 12:02 PM Lénárd Szolnoki wrote:
>>
>> No, that's not what I am saying. In your paper optional::emplace is
>> implemented the following way:
>>
>> template<typename T>
>> T &optional<T>::emplace(T &&&arg)
>> {
>> this->reset();
>> T &retval = *::new(this->buffer) T( arg() );
>> this->has_value_bool = true;
>> return retval;
>> }
>>
>> This is what happens when called with the "deferred" or "lazy" argument
>> `std::move(*opt_str) + " World!`.
>
>
> The expression:
>
> std::move(*opt_str) + " World!"
>
> is a function call as follows:
>
> operator+( std::move(*opt_str), " World!" );
>
> and it returns a PRvalue of type 'std::string'.
>
> As I wrote in my paper under the header "x86_64 Linux implementation",
> the compiler must emit a helper function that moves the arguments from
> memory to registers, as follows:
>
> Helper_Function: ; address of return slot is in RDI
> mov rax, rsi ; address of arguments in memory
> mov rsi, [rax+0] ; move address of "*opt_str" from
> memory to register
> mov rcx, [rax+8] ; move address of " World!" from
> memory to register
> jmp operator+(string&&,char*)
>
> More on this later. . .
>
>
>> 1. this->reset()
>> After this the optional doesn't hold a value anymore.
>>
>> 2. T& retval = *::new(this->buffer) T( arg() );
>> Here `arg()` evaluates `std::move(*opt_str) + " World!`. But `opt_str`
>> has no value anymore, therefore `*opt_str` is UB.
>
>
> The arguments to the-function-which-returns-the-PRvalue get evaluated
> before the 'emplace' method is invoked.
You are right, this is something I missed in the proposal. To be honest,
I find this a rather weird design choice. Admittedly this can still
mitigate the issue in some particular cases, but as you also found out
down below, not here.
Anyway, even if it did solve the issue, if for some reason you needed a
more complicated expression than `std::move(*opt_str) + " World!"` and
then decide to extract that into a function that takes a reference to an
optional then you are back to the original issue again, like so:
opt_str.emplace(append_world(opt_str));
Only lazily evaluating the "top-most" expression feels like a weird
place to draw the line and results in a quirk that can break extracting
expressions into functions.
> That is to say, the arguments
> to the "operator+" function get evaluated before the 'emplace' method
> is invoked. Looking at your original 'foo' function again:
>
> void foo(std::optional<std::string>& opt_str)
> {
> if ( opt_str.has_value() )
> {
> opt_str.emplace( std::move(*opt_str) + " World!" );
> }
> }
>
> After PRvalue Parameters have been added to the language, the x86_64
> assembler on Linux for 'foo' will be something like as follows. (Note
> on Linux that the first 3 arguments go in RDI, RSI, RCX, and the
> return value goes in RAX).
>
> Foo: ; address of opt_str is in RDI
> push rdi ; save address of opt_str for later
> call optional<string>::has_value ; returned boolean goes in RAX
> test rax, rax ; check if has_value returned false
> jnz has_value_is_true ; if nonzero jump to has_value_is_true
> pop rdi ; just to restore stack pointer
> ret ; return from function
> has_value_is_true:
> mov rdi, [rsp] ; set 1st param = address of opt_str
> call optional<string>::operator* ; return value goes in RAX
> pop rdi ; address of opt_str in 1st argument
> position again
> push "World!" ; 2nd argument wrapped to helper function is "World!"
> push rax ; 1st argument wrapped to helper function
> is address of *opt_str
> mov rsi, Helper_Function ; set 2nd argument to address of
> Helper_Function
> mov rcx, rsp ; set 3rd argument to current stack pointer
> call optional<string>::emplace(string&&&)
> sub rsp, 16 ; just to restore the stack
> ret
>
> Okay . . . having written this out . . . . I can see now where the problem is:
> The address of '*opt_str' is passed to the Helper_Function, and it
> is the 'emplace' method that will invoke the Helper_Function, but the
> 'emplace' method will invoke "this->reset()" before invoking
> Helper_Function, meaning that Helper_Function will be given the
> address of a string that has already been destroyed. I wonder if
> there's a remedy to this. Specifically the problem is as follows:
> (1) If we destroy the string before calling the helper, then we
> get a segfault
> (2) If we don't destroy the string before calling the helper, then
> we get a memory leak
>
> I don't think that No. 1 can be remedied at all -- we can't query an
> object after its memory has been deallocated. But I think that No. 2
> can be remedied if we relocate the object in order to destroy it later
> at a different location. So the chain of events inside the 'emplace'
> method would be:
> (1) Relocate "*opt_str" to another memory address
> (2) Invoke the helper function (thus overwriting the string in-place)
> (3) Destroy the relocated string
>
> So then the implementation of 'emplace(string&&&)' would become something like:
>
> template<typename T>
> T &optional<T>::emplace(T &&&arg)
> {
> if ( false == this->has_value() )
> {
> T &retval = *::new(this->buffer) T( arg() );
> this->has_value_bool = true;
> return retval;
> }
>
> this->has_value_bool = false;
>
> alignas(string) char unsigned reloc_buf[ sizeof(string) ];
> std::relocate( (string*)reloc_buf, &this->value() );
> Auto( ((string*)reloc_buf)->~string() );
>
> T &retval = *::new(this->buffer) T( arg() );
> this->has_value_bool = true;
>
> return retval;
> }
>
> Could something like that work?
No, `arg()` doesn't know anything about your relocated string. This is
would also be only a partial remedy, as I can have an example where the
deferred function call evaluates `opt_str.has_value()`, which has a
different return value with the function call not being deferred.
Cheers,
Lénárd
Received on 2024-06-05 13:52:22