C++ Logo

sg16

Advanced search

std::environment

From: Tom Honermann <tom_at_[hidden]>
Date: Sun, 1 Jan 2023 17:51:35 -0500
Happy New Year!

Jeff Garland recently indicated
<https://github.com/cplusplus/papers/issues/329#issuecomment-1357753333>
that he is intending to restart work on std::environment; an improved
facility for interaction with environment variables. The following
describes relevant history that I'm aware of and my own thoughts and
preferences for a future proposal.


  History

Previous discussions have been in the context of these two papers:

  * P1275 <https://wg21.link/p1275>: Desert Sessions: Improving hostile
    environment interactions
  * P1750 <https://wg21.link/p1750>: A Proposal to Add Process
    Management to the C++ Standard Library

Records of previous discussion are available at the links below.
Relevant std::environment related design polls are included inline
(other polls are omitted but can be found in the linked records of
discussion).

  * 2018-11-07, San Diego, LEWGI discussion of P1275
    <https://wiki.edg.com/bin/view/Wg21sandiego2018/P1275>.
      o POLL: std::environment should be immutable
        Attendance: 11
        SF
         F
         N
         A
         SA
        4
         4
         1
         1
         1

  * 2018-11-08, San Diego, SG16 discussion of P1275
    <https://wiki.edg.com/bin/view/Wg21sandiego2018/P1275R0>.
      o P1275R0: std::environments and std::arguments should follow the
        precedent set by std::filesystem::path.
        Attendance: 14
        SF
         F
         N
         A
         SA
        4
         6
         1
         0
         2

      o P1275R0: std::environment and std::arguments should return a
        bag-o-bytes and conversion is up to the user.
        Attendance: 14
        SF
         F
         N
         A
         SA
        3
         4
         2
         1
         2

      o The first poll had stronger consensus.
  * 2019-06-26 SG16 telecon
    <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2019.md#june-26th-2019>.
  * 2019-07-15, Cologne, LEWGI discussion of P1750
    <https://wiki.edg.com/bin/view/Wg21cologne2019/P1750>.
  * 2019-07-17, Cologne, SG16 discussion of P1750
    <https://wiki.edg.com/bin/view/Wg21cologne2019/SG16P1750R0>.
  * 2019-07-19, Cologne, LEWG discussion of P1750
    <https://wiki.edg.com/bin/view/Wg21cologne2019/P1750R0>.
  * 2019-07-31 SG16 telecon
    <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2019.md#july-31st-2019>.
  * 2020-02-04, Prague, SG16 discussion of P1750
    <https://wiki.edg.com/bin/view/Wg21prague/SG16P1750R1>.
  * 2020-05-13 SG16 telecon
    <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2020.md#may-13th-2020>.

The remainder of this message discusses design and implementation
concerns that I would like to see discussed in a future paper on this
topic. These reflect my personal thoughts, concerns, and preferences;
SG16 chair hat off.


  The Microsoft C run-time library (RTL) maintains multiple environment
  blocks

The Windows operating system maintains a single wchar_t-based
environment block per process that can be directly manipulated using
Win32 APIs. This environment block is henceforth referred to as the
Win32 environment block.

Microsoft's C RTL maintains up to two copies of the Win32 environment
block, one in char-based storage and another in wchar_t-based storage.
The need for such copies arose from compatibility with historic C and
POSIX interfaces:

  * C specifies getenv() for access to environment variables using
    char-based interfaces. Microsoft added _wgetenv() to provide a
    wchar_t-based interface.
    https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/getenv-wgetenv?view=msvc-170
  * POSIX specifies an environ global variable to grant raw access to
    the environment block (including for mutation). The expected format
    and access provisions do not necessarily correspond to how the Win32
    environment block is maintained; a copy is therefore required.
    Microsoft provides the POSIX behavior via an _environ global
    variable. Microsoft added _wenviron to provide a wchar_t-based
    interface as well.
    https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08
    https://learn.microsoft.com/en-us/cpp/c-runtime-library/environ-wenviron?view=msvc-170
  * Various implementations have historically provided raw access to the
    environment block via an envp argument passed to main(). As with the
    environ global variable, the expected format and access provisions
    do not necessarily correspond to how the Win32 environment block is
    maintained thus necessitating a copy. Microsoft also provides such
    access via an envp argument to its wmain() function.
    https://learn.microsoft.com/en-us/cpp/c-language/arguments-to-main?view=msvc-170
    https://learn.microsoft.com/en-us/cpp/c-language/argument-description?view=msvc-170
    <https://learn.microsoft.com/en-us/cpp/c-language/arguments-to-main?view=msvc-170>

The C RTL and Win32 maintained environment variable blocks can get out
of sync or produce surprising results for various reasons. Microsoft
documentation acknowledges this at
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/getenv-wgetenv?view=msvc-170.
Some of the ways in which the environment blocks can get out of sync
include:

  * Changes to the Win32 environment block via Win32 APIs may not be
    reflected by the C RTL interfaces.
  * The C RTL environment blocks are initially created from the Win32
    environment block, but not necessarily at the same point in time.
  * The C RTL interfaces use '?' as a wildcard when matching environment
    variable names so that variables with names that lack representation
    in the relevant character set can still be looked up.

A live demonstration of such issues is available at
https://godbolt.org/z/5WxabETYn.

A new standard interface to access the program environment presents an
opportunity to make different design choices that would not require the
environment block duplication that the Microsoft C RTL currently
performs. My preference is for a design that works directly with the
Win32 environment block (for improved interoperability with other
languages), but it is worth noting that doing so will lead to
inconsistent behavior with std::getenv().

It is reasonable to question whether this section is relevant to the
standard since it discusses the details of one particular
implementation. However, the standard is impacted with regard to what
requirements it can impose. For example, a requirement that
std::getenv("FOO") return the same result as std::environment["FOO"]
would presumably prevent Microsoft from providing a std::environment
implementation that operates directly on the Win32 environment block.


  Relationship to other interfaces

As mentioned in the previous section, there are a number of interfaces
that existing programs may use to access the environment. These include,
at least:

  * C getenv() and Microsoft _wgetenv().
  * POSIX setenv() and putenv().
  * POSIX environ.
  * Microsoft _environ and _wenviron.
  * The envp argument provided to main() (or _wmain()).

The proposal should discuss thread safety with regard to these other
interfaces. I'm sure there will be wide consensus for std::environment
to be thread-safe. It might be reasonable to require a program to
perform its own synchronization if std::environment might be
concurrently used with these other interfaces by different threads.


  Case sensitivity and portable environment variable names

Environment variables are case sensitive in POSIX, but case insensitive
on Windows. P1275 appears to want to extend a form of case insensitivity
to all platforms; a design goal I find worrisome for several reasons.
P1275 also incorrectly states that "all platforms store their keys as
ASCII strings"; Windows at least allows environment variable names to
contain non-ASCII characters.

POSIX defines a Portable Character Set
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tagtcjh_3>
that specifies the set of characters that all implementations must
support as environment variable names. Implementations are allowed to
use an extended character set. Any proposal should specify what
characters an implementation must support; likewise, I believe the
proposed std::environment design should not restrict use of additional
characters supported by an implementation.


  Character encodings

Environment variable names and values, like filesystem names, while
ostensibly textual, do not have a well defined associated character
encoding. In practice, environment variable values can hold arbitrary
binary data (8-bit on POSIX systems, 16-bit on Windows), but are often
interpreted for display purposes using a locale sensitive character
encoding. A model similar to the one used for std::filesystem::path in
which the raw environment variable names and values are exposed via an
implementation-defined value_type (char for POSIX, wchar_t for Windows)
and best effort translations are made available for the associated
character encodings for char, wchar_t, char8_t, char16_t, and char32_t
is therefore appropriate.

Assuming a design in which std::environment enables mutation of the
environment block, there are opportunities for improvement over the
std::filesystem::path constructors. For example, it could be better
specified how character encoding conversion errors are handled.


  Support for unit tests, posix_spawn(), execve(), CreateProcess(), etc...

A sometimes frustrating limitation of std::filesystem is that there is
no support for proxy or virtual filesystems; it always reflects the
native host filesystem. I've most often experienced this in unit testing
situations in which it would be useful to be able to mock a filesystem.
Similar needs are more pervasive for environment blocks since process
creation interfaces like those listed above allow a custom environment
block to be prepared and passed in a well specified format (Microsoft
allows a custom environment block to be specified in either char-based
storage or wchar_t-based storage; see the CreateEnvironmentBlock()
<https://learn.microsoft.com/en-us/windows/win32/api/userenv/nf-userenv-createenvironmentblock>
function and the CREATE_UNICODE_ENVIRONMENT flag for CreateProcessA()
<https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessa>).
I've also encountered a need to maintain and synchronize separate
environment variable blocks in database storage.

Such needs suggest that a design that provides matching interfaces for
multiple implementations is in order. For example:

  * An implementation that interacts directly with the host environment
    block (e.g., via POSIX or Win32 APIs).
  * An implementation that maintains an environment block in a local
    data structure (e.g., a std::map<KEY-TYPE, VALUE-TYPE>).
  * An implementation that wraps another one in a type-erased fashion.

Perhaps this hints at a parameterized design where std::environment is a
type alias of std::basic_environment<std::environment_manager> and where
additional managers (including user written ones) are provided to fit
other needs.

Tom.

Received on 2023-01-01 22:51:37