On Mon, Jan 2, 2023 at 3:31 AM Jeff Garland <jeff@crystalclearsoftware.com> wrote:

Thanks for posting this Tom — really good paper input :) I’ve indeed started drafting an outline and this is *way harder/worse* than I hoped. I *think* I agree with most everything that’s been said so far — here’s my summary so of directional thoughts so far:

1) It is a non-negotiable requirement for std::process — a different proposal I’ll be restarting — to write the environment before launching a child process — Ville and Jake's points. There simply is no standard way to do it currently — putenv, unsetenv is posix only. No matter what I think this should be fixed.

There are 2 features here:

* The ability to set environment variables on a child process ( in effect a fancy range of pair of string that can be seeded by the environment of the current process )

* The ability to query the environment

I think we should keep in mind they are separate use cases (they may call for a similar or identical features but we should understand they are separate)

The process's environment is just what exec and similar default to when an environment is not provided, and you need not modify the current environment to modify the environment of a child.

2) In my survey of other languages, basically none of them can really solve the thread safety issues. Why? Because it’s not supported by the platforms -- so I think it's literally not possible. You can have 3 Rust threads all mutexed on shared environment while a C thread does whatever it wants and messes them up. Their documentation is very clear on this point — read Jake’s points.

So as Jonathan indicates we could create an api that does a snapshot of the environment that can be accesses safely — but of course that won’t be accurate if another thread changes it. To my mind, this doesn’t fix the problem. Unfortunately this is unprotected global state and short of a thread safe platform api c++ probably can’t fix it. So sure, we could provide some API that limits a C++ program from stomping on itself if it’s all C++ — maybe that’s worth it so that c++ libraries and user code could count on that facility? Still not going to fix threads in C or other languages from bad interactions with C++ threads.

A further issue is that putenv is non-standard so fixing that in C or C++ will be challenging.

Things I've considered in the past:

* Doing a copy at launch, but that seems extremely wasteful

* Standardizing putenv and mandating some kind of thread safety - I wonder how that would go

* Standardizing some "get_parent_env" that would be immutable (which means putenv/getenv would have to be modified to maintain a list of mutations)

3) Aside from the obvious pointer usability issues, getenv isn’t a great way to ‘discover’ what’s actually set in the environment. Keeping in mind that often — ‘not set’ has meaning. Currently you'd need to ask one a a time. I think there’s some value to a range based facility to grab a set of variables (even a post on reddit this week of a lib that can grab a filtered set).

I’ll now assume all the respondents fo this thread will be early feedback candidates :)

Jeff

On Jan 1, 2023, at 7:16 PM, Jake Arkinstall <jake.arkinstall@gmail.com> wrote:

On Mon, 2 Jan 2023, 00:36 Corentin Jabot via Lib-Ext, <lib-ext@lists.isocpp.org> wrote:
(putenv is not a system facility, it has no effect whatsoever on the system, in only affect the current process)

And, I believe, processes spawned from the current process, such as via fork. It also has an effect on std::system calls. I don't use this functionality but I imagine that, if I did, I'd want the ability to control its environment.

Environment variables are predominantly used as input - even more so in recent years - but they can be used as "output" too. Otherwise they wouldn't be provided to be read as input to begin with.

So if we want to be able to launch processes but don't want to mutate the environment, we need a mechanism to pass environment variables to the processes.

It would also make sense that, if we must have an immutable global pseudo-environment, we allow transformations that provide new pseudo-environments that we can pass around. At the very least, std::system should gain an overload that accepts a pseudo-environment that results from this.

Are we providing a way to query the environment or a global map of strings?

The above being said, I would still find it surprising if a call to putenv didn't change the output from something that is supposed to abstract and modernise environment access. I imagine many others would.

I would also find it surprising if something that is supposed to abstract and modernise environment access only provided read-only functionality, especially if the justification of that was based on the premise that reading and writing isn't thread safe. Anyone wanting to write would be forced to use setenv or putenv anyway, and the problem still emerges.

Or, more generally:

- We have an unsafe thing.
- We make a safe alternative that is incomplete because some parts weren't possible to to in a safe manner
- We still have the unsafe thing because we don't break things for users.
- Therefore we still have an unsafe thing.

We can only add footguns. We can't take them away. I worry that trying to avoid footguns by pushing users to use the old ones, and then making any kind of decision based on the wrong assumption that the footgun is dealt with, can only lead to new footguns.

In the case of environment variables, this is just the nature of global state managed by the operating system. The safety is at the hands of the operating system. Rust has its own battles with it, too - and recent ones, resulting from CVEs relating to setenv safety. https://github.com/rust-lang/rust/issues/90308 tentatively concludes with removing the safety guarantee of setting environment variables. We don't have a concept of safety in the language, but I feel that avoiding the functionality altogether for that reason doesn't make sense.

So to actually get to your question, I think storing state and hoping existing code doesn't change it via the features already available to it is a bad move, and I think it should be the former: a way to query the current environment.

If, however, the alternate path is chosen, then it needs to be named well. Environment variables are mutable. If what the user is querying is actually the environment variables at startup, then it should be named accordingly to avoid confusion.