On Tue, Nov 19, 2019, 19:17 JeanHeyd Meneide <phdofthehouse@gmail.com> wrote:
Corentin and SG7:

     I have put my comments in below intermingled with the original e-mail's text; I apologize if it is hard to read. As a summary:

     - I will not be pursuing std::embed at this time.
     - #embed is not going to be proposed to WG21 right now.
     - I will happily await any new direction for Compile Time I/O, perhaps in relation to having an entirely constexpr P1130 LLFIO (https://wg21.link/p1031r2) or within the context of a more powerful preprocessor/pre-Phase-7 meta-generator that looks like Dr. Sutton's work (https://www.youtube.com/watch?v=kjQXhuPX-Ac) or Sean Baxter's Circle (https://github.com/seanbaxter/circle). I am woefully uneducated about the details and minutiae these approaches.

     P1040 - std::embed is frozen. If you would like to pursue it, feel free to write a new paper and reference P1040 (or even directly pull information out of it).

Hello, 
I am a bit concerned with the direction std::embed is going and I'll like to see if we can agree on a few things.

A preprocessor based approach does not offer sufficient benefits

I understand that JeanHeyd has been given enough contradictory guidances that he might be tempted to go with the preprocessor #embed solution.
I am concerned that this would not solve anything.

The entire value of std::embed is to improve compile time. Anything that would create a node per byte would have disastrous 
compile time performance. My observation is that parsing is not the bottleneck.

     Parsing, validating, and storing a sequence of expressions that ultimately is just a comma-delimited, brace-hugged list of numbers creates problems so bad that even Static Analysis companies have reached out to me in support of #embed and std::embed. The current specification for #embed states that it is treated "as if" a brace delimited list of integer-literal values is generated. A prudent implementation would generate a builtin for this to save on the parsing of exceptionally large arrays: both GCC and Clang implementers have stated that this is possible and easy to do (my implementation is a toy, and thusly does not do this quite yet).
 
Sure that might be solvable by an attribute which would change how arrays are represented by the compiler.
Even then, the compiler would have to store the entire file's content in memory whereas std::embed  can be designed to be backed by a memory mapped file.

     Having implemented this in 2 compilers, this was not my experience. For example, Clang already has a SourceManager and FileManager which caches data from files large and small, and string literals in many cases get folded together and interned for speed and memory savings. All the implementation needs to do is point to that memory, memory-mapped or not. This is the reason for the "as if" language in the wording, to allow builtins to take advantage of this or potentially other representations (e.g., "A dedicated AST node", as a person present in the EWG discussion of #embed put it).
 
And so a preprocessor based approach would have little value over generated source files and suffer many of the same issue.
That it can be pushed through the committee faster should not be a reason to pursue that direction.

     I think you are fundamentally misinterpreting the reasons for #embed. The goal is to provide a before-Phase-7 (not-constexpr) way for scanners and dependency managers that read source code to directly resolve resources (typically, files) without potentially requiring full semantic analysis (e.g., almost every step of compilation before code generation). This enables current-generation distributed build systems to work. A preprocessor-based approach provides exactly that, and given the reception of P1130 (https://thephd.github.io/vendor/future_cxx/papers/d1130.html) by EWG, there was no appetite for a modules-specific syntax. There was also no appetite for a special kind of string literal for this; see P1040's "prior art" section (https://thephd.github.io/vendor/future_cxx/papers/d1040.html#design-prior).

     "Push through the Committee" is both presumptuous and extremely insulting to my treatment of the Process and the guidance laid out in P0939 (https://wg21.link/p0939) and P1000 (https://wg21.link/p1000). Expedience of Committee acceptance was never a goal; my work has always been thorough, user-focused, and pooled from listening to the vast number of users and their varied needs. #embed rose out of the need to support a simple + intuitive syntax for grabbing file contents in the simplest case ("keep simple things simple"), distributed build systems, and current dependency scanning tools: there are no other reasons. std::embed has sat in my lap for over a year, and despite several e-mails from hobbyist developers to U.S. National Lab engineers begging me to make progress, I took the time to understand the entirety of the constituency and propose solutions that solved their issues. If that is "push through the Committee", then I am not sure what other words you need to hear to be convinced otherwise.

I want to clarify that in no way i intend to pretend that you were not diligent in your work and was more referring to the poll taken by sg7. I just think that despite #embed being simpler and having less dependencies I personally don't see that as a viable solution. Sorry if I was misconstrued. The amount of work you did on this proposal commands respect.


     But, words are cheap: perhaps this e-mail showing that I am no longer pursuing the area altogether in WG21 will serve as ultimate proof of my commitment to the quality, not the speed, of the process.
 
Security concerns
It was always my understanding that embedable resources would be found by a mechanism similar to include paths
and as such giving full filesystem access was never really on the table ? 
Is such a flag insufficient? It would require inspecting what the build system does, rather than individual file.
I would be sympathetic to a per-file mechanism to identify resources that can be open but would like decoupled from std::embed

ie:
#pragma resource "foo.txt"
[...]
std::embed("foo.txt")
We might also want to make all paths relatives and implementations can support a white/blacklist of resource path.
I think it's important to support both file and directories - but supporting only files as a first approach seems reasonable.
 
     I am woefully unequipped to answer security concerns. I wish you the best in figuring it out.

Tooling
A mechanism decoupled of std::embed as in the paragraph above would support the needs of tooling.
I don't think tooling support should stop std::embed though.
Having to specify dependencies on resources manually is reasonable given that resources should be few

     The tooling vendors in the room for std::embed discussion strongly disagree with your assertions here. In the current world they can find every resource without having to invoke full compilation. std::embed makes that hard for them. This was cause for Weakly Against and Strongly Against votes; please see the Belfast Wiki of EWG and SG7 Discussion for the individuals and contact them directly (or they can self-volunteer information here).
 
Modules
To properly supports std::embed and modules, BMIs should keep track of resources path when they are compiled relative to the BMIs, or the source files
The idea is that an importing module should have access to the same resources as the imported modules.
Alternatively it can be the responsibility of the build system to deal with that.

     I am not adept in how modules behave and can offer no reassurances here.

More General Solution ?
It is unclear to me that std::embed is not already the general solution.
It returns a span, which is mean it can be manipulated with any algorithm or view adapter offered by the standard library, most of which are constexpr already.
This is strictly more versatile than file which we have fewer tools to handle.
In fact i suspect a primary use case for memory mapped file will be to wrap them in span and use them with algorithm.

Moreover, the concerns that std::embed has security concerns but is not general enough seem antithetic.
I don't think we want write capabilities, or i/o on fd that are not file on disks - more for reproducibility than security reasons.
I am also not convinced that the use cases for runtime i/o and compile time/io are the same.

Neverther less, i see several solutions:
  • Making file and mapped_file as proposed by P1031[1] constexpr (partially, we need a small subset), which I believe Niall has been working towards
  • Continue with std::embed which does not preclude a constexpr stream-like interface on top of a span in the future 

Trying to blend compile-time and runtime io, seems to me to be only seductive on the surface, but in reality the use cases will be different,
and I believe std::embed serves all the use cases people actually care about in practice.

     SG7 sees that there is need to solve this problem, but they disagreed with the assertion that std::embed is the solution in Belfast. SG7 strictly overviews Compile-time Programming and without their support std::embed goes nowhere. I also have no direction for continued work, except to try to envision a new kind of limited stream that pulls from a potential pool of compiler-specified directions. std::resourcestream, or something similar? It is unclear what the forward direction is or should be, and I do not have the time to join SG7 in discovering a potentially new direction for Compile-time Programming. Given the power that std::embed in its current form enables and the use cases it was designed to cover from Build Systems and seeing the alternatives presented for P1040 in small-room SG7 discussion, I would rather not move forward in shaky, unsure territory and instead will freeze the current proposal progress.

     I have no plans on unfreezing it. Feel free to write your own paper(s) on the subject which covers the exact same space as std::embed; this message should serve as proof to individuals who ask "Did you Consult with Author X about P1040, it seems similar?" that you do not need to wait or wonder where this paper is going.

Best of Luck,
JeanHeyd Meneide