Date: Tue, 03 Jan 2023 19:11:24 +0000
On 02/01/2023 02:31, Jeff Garland via SG16 wrote:
> Thanks for posting this Tom — really good paper input :) I’ve indeed
> started drafting an outline and this is *way harder/worse* than I hoped.
> I *think* I agree with most everything that’s been said so far —
> here’s my summary so of directional thoughts so far:
>
> 1) It is a non-negotiable requirement for std::process — a different
> proposal I’ll be restarting — to write the environment before
launching
> a child process — Ville and Jake's points. There simply is no standard
> way to do it currently — putenv, unsetenv is posix only. No matter
> what I think this should be fixed.
I don't know why you think you need to modify the current process
environment when starting a new process. You can take a copy of the
current environment, modify that **copy**, and supply that to the new
process. This works absolutely fine, LLFIO's process_handle does this
and it "just works".
As you mentioned, modifying the current process environment is highly
unwise, because it's fundamentally racy and there is zero way to avoid
raciness due to lack of support in the syscall API. It's best to avoid
ever doing that.
> 3) Aside from the obvious pointer usability issues, getenv isn’t a great
> way to ‘discover’ what’s actually set in the environment. Keeping
in
> mind that often — ‘not set’ has meaning. Currently you'd need to
ask
> one a a time. I think there’s some value to a range based facility to
> grab a set of variables (even a post on reddit this week of a lib that
> can grab a filtered set).
getenv() is weirdly enough not async signal safe. Most will consider
that unimportant, which is true until suddenly it is extremely
important. It would be really great if C++'s implementation gained async
signal safety.
Some further points:
1. path_view makes a lot more sense than using paths. Indeed, LLFIO's
process_handle considers "an environment" to be a
unique_ptr<span<path_view_component>, impl_defined_deleter>. LLFIO does
not interpret '=' for you, so key pairs will likely be "VAR=", "VAR=val"
or "VAR". In the real world, I've also seen "=val", and "". I think any
attempt by the standard to force interpretation for those is unwise -
just slap "implementation defined" on what the environment entry format
is, and perhaps supply a static constexpr separator value for this
platform (i.e. just like filesystem path).
2. I think assuming environment variables will be in text is as bad an
assumption as assuming filesystem entries will be in text. Because
nobody requires the values to be in text, and shell scripts can and do
pass unencoded bytes there which work fine so long as they never use
null. Which sounds awfully like filesystem entries, except that '/' is
somewhat replaced by '=' except only the first one has meaning.
3. The POSIX says that implementations must be able to tolerate
environment variable names not being in the portable character set, so
any value other than '=' could be possible for names as well as for
values. I have to admit I haven't seen much of that in the wild, because
shell interpreters would generally puke badly, but I have seen it be
used to detect where some intervening code layer is performing a Unicode
transcoding between child processes (i.e. it will be mangling your data
pipes, so you need to fall back to 7-bit clean transmission).
As I mentioned before re: the process proposal, if WG21 likes the low
level i/o direction for socket i/o and file i/o, it would make sense to
keep going with pipe i/o and subprocess i/o. LLFIO's process_handle is
tedious to use as it's very low level, a high level wrapper such as a
std::process would be most useful. Still, the ability to supply a
subprocess as if it were a socket without having to recompile code can
be extremely useful, and that's why I'd suggest this approach.
Niall
> Thanks for posting this Tom — really good paper input :) I’ve indeed
> started drafting an outline and this is *way harder/worse* than I hoped.
> I *think* I agree with most everything that’s been said so far —
> here’s my summary so of directional thoughts so far:
>
> 1) It is a non-negotiable requirement for std::process — a different
> proposal I’ll be restarting — to write the environment before
launching
> a child process — Ville and Jake's points. There simply is no standard
> way to do it currently — putenv, unsetenv is posix only. No matter
> what I think this should be fixed.
I don't know why you think you need to modify the current process
environment when starting a new process. You can take a copy of the
current environment, modify that **copy**, and supply that to the new
process. This works absolutely fine, LLFIO's process_handle does this
and it "just works".
As you mentioned, modifying the current process environment is highly
unwise, because it's fundamentally racy and there is zero way to avoid
raciness due to lack of support in the syscall API. It's best to avoid
ever doing that.
> 3) Aside from the obvious pointer usability issues, getenv isn’t a great
> way to ‘discover’ what’s actually set in the environment. Keeping
in
> mind that often — ‘not set’ has meaning. Currently you'd need to
ask
> one a a time. I think there’s some value to a range based facility to
> grab a set of variables (even a post on reddit this week of a lib that
> can grab a filtered set).
getenv() is weirdly enough not async signal safe. Most will consider
that unimportant, which is true until suddenly it is extremely
important. It would be really great if C++'s implementation gained async
signal safety.
Some further points:
1. path_view makes a lot more sense than using paths. Indeed, LLFIO's
process_handle considers "an environment" to be a
unique_ptr<span<path_view_component>, impl_defined_deleter>. LLFIO does
not interpret '=' for you, so key pairs will likely be "VAR=", "VAR=val"
or "VAR". In the real world, I've also seen "=val", and "". I think any
attempt by the standard to force interpretation for those is unwise -
just slap "implementation defined" on what the environment entry format
is, and perhaps supply a static constexpr separator value for this
platform (i.e. just like filesystem path).
2. I think assuming environment variables will be in text is as bad an
assumption as assuming filesystem entries will be in text. Because
nobody requires the values to be in text, and shell scripts can and do
pass unencoded bytes there which work fine so long as they never use
null. Which sounds awfully like filesystem entries, except that '/' is
somewhat replaced by '=' except only the first one has meaning.
3. The POSIX says that implementations must be able to tolerate
environment variable names not being in the portable character set, so
any value other than '=' could be possible for names as well as for
values. I have to admit I haven't seen much of that in the wild, because
shell interpreters would generally puke badly, but I have seen it be
used to detect where some intervening code layer is performing a Unicode
transcoding between child processes (i.e. it will be mangling your data
pipes, so you need to fall back to 7-bit clean transmission).
As I mentioned before re: the process proposal, if WG21 likes the low
level i/o direction for socket i/o and file i/o, it would make sense to
keep going with pipe i/o and subprocess i/o. LLFIO's process_handle is
tedious to use as it's very low level, a high level wrapper such as a
std::process would be most useful. Still, the ability to supply a
subprocess as if it were a socket without having to recompile code can
be extremely useful, and that's why I'd suggest this approach.
Niall
Received on 2023-01-03 19:11:25