Date: Mon, 2 Jan 2023 11:22:17 +0100
On Mon, Jan 2, 2023 at 3:31 AM Jeff Garland <jeff_at_[hidden]>
wrote:
> Thanks for posting this Tom — really good paper input :) I’ve indeed
> started drafting an outline and this is *way harder/worse* than I hoped. I
> *think* I agree with most everything that’s been said so far — here’s my
> summary so of directional thoughts so far:
>
> 1) It is a non-negotiable requirement for std::process — a different
> proposal I’ll be restarting — to write the environment before launching a
> child process — Ville and Jake's points. There simply is no standard way
> to do it currently — putenv, unsetenv is posix only. No matter what I
> think this should be fixed.
>
There are 2 features here:
* The ability to set environment variables on a child process ( in
effect a fancy range of pair of string that can be seeded by the
environment of the current process )
* The ability to query the environment
I think we should keep in mind they are separate use cases (they may
call for a similar or identical features but we should understand they are
separate)
The process's environment is just what exec and similar default to when an
environment is not provided, and you need not modify the current
environment to modify the environment of a child.
>
> 2) In my survey of other languages, basically none of them can really
> solve the thread safety issues. Why? Because it’s not supported by the
> platforms -- so I think it's literally not possible. You can have 3 Rust
> threads all mutexed on shared environment while a C thread does whatever it
> wants and messes them up. Their documentation is very clear on this point
> — read Jake’s points.
>
> So as Jonathan indicates we could create an api that does a snapshot of
> the environment that can be accesses safely — but of course that won’t be
> accurate if another thread changes it. To my mind, this doesn’t fix the
> problem. Unfortunately this is unprotected global state and short of a
> thread safe platform api c++ probably can’t fix it. So sure, we could
> provide some API that limits a C++ program from stomping on itself if it’s
> all C++ — maybe that’s worth it so that c++ libraries and user code could
> count on that facility? Still not going to fix threads in C or other
> languages from bad interactions with C++ threads.
>
A further issue is that putenv is non-standard so fixing that in C or C++
will be challenging.
Things I've considered in the past:
* Doing a copy at launch, but that seems extremely wasteful
* Standardizing putenv and mandating some kind of thread safety - I
wonder how that would go
* Standardizing some "get_parent_env" that would be immutable (which
means putenv/getenv would have to be modified to maintain a list of
mutations)
>
> 3) Aside from the obvious pointer usability issues, getenv isn’t a great
> way to ‘discover’ what’s actually set in the environment. Keeping in mind
> that often — ‘not set’ has meaning. Currently you'd need to ask one a a
> time. I think there’s some value to a range based facility to grab a set of
> variables (even a post on reddit this week of a lib that can grab a
> filtered set).
>
+1
>
> I’ll now assume all the respondents fo this thread will be early feedback
> candidates :)
>
> Jeff
>
> On Jan 1, 2023, at 7:16 PM, Jake Arkinstall <jake.arkinstall_at_[hidden]>
> wrote:
>
> On Mon, 2 Jan 2023, 00:36 Corentin Jabot via Lib-Ext, <
> lib-ext_at_[hidden]> wrote:
>
>> (putenv is not a system facility, it has no effect whatsoever on the
>> system, in only affect the current process)
>>
>
> And, I believe, processes spawned from the current process, such as via
> fork. It also has an effect on std::system calls. I don't use this
> functionality but I imagine that, if I did, I'd want the ability to control
> its environment.
>
> Environment variables are predominantly used as input - even more so in
> recent years - but they can be used as "output" too. Otherwise they
> wouldn't be provided to be read as input to begin with.
>
> So if we want to be able to launch processes but don't want to mutate the
> environment, we need a mechanism to pass environment variables to the
> processes.
>
> It would also make sense that, if we must have an immutable global
> pseudo-environment, we allow transformations that provide new
> pseudo-environments that we can pass around. At the very least, std::system
> should gain an overload that accepts a pseudo-environment that results from
> this.
>
> Are we providing a way to query the environment or a global map of strings?
>>
>
> The above being said, I would still find it surprising if a call to putenv
> didn't change the output from something that is supposed to abstract and
> modernise environment access. I imagine many others would.
>
> I would also find it surprising if something that is supposed to abstract
> and modernise environment access only provided read-only functionality,
> especially if the justification of that was based on the premise that
> reading and writing isn't thread safe. Anyone wanting to write would be
> forced to use setenv or putenv anyway, and the problem still emerges.
>
> Or, more generally:
>
> - We have an unsafe thing.
> - We make a safe alternative that is incomplete because some parts weren't
> possible to to in a safe manner
> - We still have the unsafe thing because we don't break things for users.
> - Therefore we still have an unsafe thing.
>
> We can only add footguns. We can't take them away. I worry that trying to
> avoid footguns by pushing users to use the old ones, and then making any
> kind of decision based on the wrong assumption that the footgun is dealt
> with, can only lead to new footguns.
>
> In the case of environment variables, this is just the nature of global
> state managed by the operating system. The safety is at the hands of the
> operating system. Rust has its own battles with it, too - and recent ones,
> resulting from CVEs relating to setenv safety.
> https://github.com/rust-lang/rust/issues/90308 tentatively concludes with
> removing the safety guarantee of setting environment variables. We don't
> have a concept of safety in the language, but I feel that avoiding the
> functionality altogether for that reason doesn't make sense.
>
> So to actually get to your question, I think storing state and hoping
> existing code doesn't change it via the features already available to it is
> a bad move, and I think it should be the former: a way to query the current
> environment.
>
> If, however, the alternate path is chosen, then it needs to be named well.
> Environment variables are mutable. If what the user is querying is actually
> the environment variables at startup, then it should be named accordingly
> to avoid confusion.
>
>
>
wrote:
> Thanks for posting this Tom — really good paper input :) I’ve indeed
> started drafting an outline and this is *way harder/worse* than I hoped. I
> *think* I agree with most everything that’s been said so far — here’s my
> summary so of directional thoughts so far:
>
> 1) It is a non-negotiable requirement for std::process — a different
> proposal I’ll be restarting — to write the environment before launching a
> child process — Ville and Jake's points. There simply is no standard way
> to do it currently — putenv, unsetenv is posix only. No matter what I
> think this should be fixed.
>
There are 2 features here:
* The ability to set environment variables on a child process ( in
effect a fancy range of pair of string that can be seeded by the
environment of the current process )
* The ability to query the environment
I think we should keep in mind they are separate use cases (they may
call for a similar or identical features but we should understand they are
separate)
The process's environment is just what exec and similar default to when an
environment is not provided, and you need not modify the current
environment to modify the environment of a child.
>
> 2) In my survey of other languages, basically none of them can really
> solve the thread safety issues. Why? Because it’s not supported by the
> platforms -- so I think it's literally not possible. You can have 3 Rust
> threads all mutexed on shared environment while a C thread does whatever it
> wants and messes them up. Their documentation is very clear on this point
> — read Jake’s points.
>
> So as Jonathan indicates we could create an api that does a snapshot of
> the environment that can be accesses safely — but of course that won’t be
> accurate if another thread changes it. To my mind, this doesn’t fix the
> problem. Unfortunately this is unprotected global state and short of a
> thread safe platform api c++ probably can’t fix it. So sure, we could
> provide some API that limits a C++ program from stomping on itself if it’s
> all C++ — maybe that’s worth it so that c++ libraries and user code could
> count on that facility? Still not going to fix threads in C or other
> languages from bad interactions with C++ threads.
>
A further issue is that putenv is non-standard so fixing that in C or C++
will be challenging.
Things I've considered in the past:
* Doing a copy at launch, but that seems extremely wasteful
* Standardizing putenv and mandating some kind of thread safety - I
wonder how that would go
* Standardizing some "get_parent_env" that would be immutable (which
means putenv/getenv would have to be modified to maintain a list of
mutations)
>
> 3) Aside from the obvious pointer usability issues, getenv isn’t a great
> way to ‘discover’ what’s actually set in the environment. Keeping in mind
> that often — ‘not set’ has meaning. Currently you'd need to ask one a a
> time. I think there’s some value to a range based facility to grab a set of
> variables (even a post on reddit this week of a lib that can grab a
> filtered set).
>
+1
>
> I’ll now assume all the respondents fo this thread will be early feedback
> candidates :)
>
> Jeff
>
> On Jan 1, 2023, at 7:16 PM, Jake Arkinstall <jake.arkinstall_at_[hidden]>
> wrote:
>
> On Mon, 2 Jan 2023, 00:36 Corentin Jabot via Lib-Ext, <
> lib-ext_at_[hidden]> wrote:
>
>> (putenv is not a system facility, it has no effect whatsoever on the
>> system, in only affect the current process)
>>
>
> And, I believe, processes spawned from the current process, such as via
> fork. It also has an effect on std::system calls. I don't use this
> functionality but I imagine that, if I did, I'd want the ability to control
> its environment.
>
> Environment variables are predominantly used as input - even more so in
> recent years - but they can be used as "output" too. Otherwise they
> wouldn't be provided to be read as input to begin with.
>
> So if we want to be able to launch processes but don't want to mutate the
> environment, we need a mechanism to pass environment variables to the
> processes.
>
> It would also make sense that, if we must have an immutable global
> pseudo-environment, we allow transformations that provide new
> pseudo-environments that we can pass around. At the very least, std::system
> should gain an overload that accepts a pseudo-environment that results from
> this.
>
> Are we providing a way to query the environment or a global map of strings?
>>
>
> The above being said, I would still find it surprising if a call to putenv
> didn't change the output from something that is supposed to abstract and
> modernise environment access. I imagine many others would.
>
> I would also find it surprising if something that is supposed to abstract
> and modernise environment access only provided read-only functionality,
> especially if the justification of that was based on the premise that
> reading and writing isn't thread safe. Anyone wanting to write would be
> forced to use setenv or putenv anyway, and the problem still emerges.
>
> Or, more generally:
>
> - We have an unsafe thing.
> - We make a safe alternative that is incomplete because some parts weren't
> possible to to in a safe manner
> - We still have the unsafe thing because we don't break things for users.
> - Therefore we still have an unsafe thing.
>
> We can only add footguns. We can't take them away. I worry that trying to
> avoid footguns by pushing users to use the old ones, and then making any
> kind of decision based on the wrong assumption that the footgun is dealt
> with, can only lead to new footguns.
>
> In the case of environment variables, this is just the nature of global
> state managed by the operating system. The safety is at the hands of the
> operating system. Rust has its own battles with it, too - and recent ones,
> resulting from CVEs relating to setenv safety.
> https://github.com/rust-lang/rust/issues/90308 tentatively concludes with
> removing the safety guarantee of setting environment variables. We don't
> have a concept of safety in the language, but I feel that avoiding the
> functionality altogether for that reason doesn't make sense.
>
> So to actually get to your question, I think storing state and hoping
> existing code doesn't change it via the features already available to it is
> a bad move, and I think it should be the former: a way to query the current
> environment.
>
> If, however, the alternate path is chosen, then it needs to be named well.
> Environment variables are mutable. If what the user is querying is actually
> the environment variables at startup, then it should be named accordingly
> to avoid confusion.
>
>
>
Received on 2023-01-02 10:22:30