Date: Sun, 17 Oct 2021 17:27:26 -0400
On Sun, Oct 17, 2021 at 11:39 AM Bjarne Stroustrup <bjarne_at_[hidden]>
wrote:
> and if I understand the discussion correctly, objections are that
>
> (1) makes he meaning of a module name dependent on all details of a
> file system convention
>
Not at all. The standard defines the semantics for what are valid
identifiers as well as how they're normalized. This proposal would not
change that.
There are specific caveats called out in the paper. Specifically, modules
with difference only in case would likely generate conflicts in filesystems
that are case insensitive, and that unicode codepoints in module names
would be subject to portability issues depending on the encoding of files
in the filesystem.
We could, however, easily protect from both cases by translating any
codepoint outside of [a-z0-9] in the identifier parts of the module name
with a simple convention like %UDEADBEEF% where DEADBEEF would be the hex
number for the unicode codepoint in the filename.
> (2) significantly slows down compilation by forcing lookup of many
> long file names
>
This is a bold claim. We have plenty of prior art on C++-aware build
systems as well as in other languages demonstrating that this is not a
significant factor in build times, at least not in POSIX systems. I would
like to hear specific evidence on why this is as big a problem as it's
being claimed.
Moreover, the proposal specifically states that an early step of the build
configuration would be to perform that discovery, at which point you would
have the full mapping of all the relevant files related to the modules. So
this objection is limited to the performance impact of discovering how to
consume modules from the system, which should be a cost paid only once in
the build workspace.
Nothing in this proposal forces the discovery cost to be paid for every
compiler invocation.
Daniel
wrote:
> and if I understand the discussion correctly, objections are that
>
> (1) makes he meaning of a module name dependent on all details of a
> file system convention
>
Not at all. The standard defines the semantics for what are valid
identifiers as well as how they're normalized. This proposal would not
change that.
There are specific caveats called out in the paper. Specifically, modules
with difference only in case would likely generate conflicts in filesystems
that are case insensitive, and that unicode codepoints in module names
would be subject to portability issues depending on the encoding of files
in the filesystem.
We could, however, easily protect from both cases by translating any
codepoint outside of [a-z0-9] in the identifier parts of the module name
with a simple convention like %UDEADBEEF% where DEADBEEF would be the hex
number for the unicode codepoint in the filename.
> (2) significantly slows down compilation by forcing lookup of many
> long file names
>
This is a bold claim. We have plenty of prior art on C++-aware build
systems as well as in other languages demonstrating that this is not a
significant factor in build times, at least not in POSIX systems. I would
like to hear specific evidence on why this is as big a problem as it's
being claimed.
Moreover, the proposal specifically states that an early step of the build
configuration would be to perform that discovery, at which point you would
have the full mapping of all the relevant files related to the modules. So
this objection is limited to the performance impact of discovering how to
consume modules from the system, which should be a cost paid only once in
the build workspace.
Nothing in this proposal forces the discovery cost to be paid for every
compiler invocation.
Daniel
Received on 2021-10-17 16:27:38