sg7: Re: [SG7] std::embed / Compile time IO

From: Corentin <corentin.jabot_at_[hidden]>
Date: Tue, 19 Nov 2019 19:32:43 +0100

On Tue, Nov 19, 2019, 19:17 JeanHeyd Meneide <phdofthehouse_at_[hidden]>
wrote:

> Corentin and SG7:
>
> I have put my comments in below intermingled with the original
> e-mail's text; I apologize if it is hard to read. As a summary:
>
> - I will not be pursuing std::embed at this time.
> - #embed is not going to be proposed to WG21 right now.
> - I will happily await any new direction for Compile Time I/O,
> perhaps in relation to having an entirely constexpr P1130 LLFIO (
> https://wg21.link/p1031r2) or within the context of a more powerful
> preprocessor/pre-Phase-7 meta-generator that looks like Dr. Sutton's work (
> https://www.youtube.com/watch?v=kjQXhuPX-Ac) or Sean Baxter's Circle (
> https://github.com/seanbaxter/circle). I am woefully uneducated about the
> details and minutiae these approaches.
>
> P1040 - std::embed is frozen. If you would like to pursue it, feel
> free to write a new paper and reference P1040 (or even directly pull
> information out of it).
>
> Hello,
>> I am a bit concerned with the direction std::embed is going and I'll like
>> to see if we can agree on a few things.
>>
>> A preprocessor based approach does not offer sufficient benefits
>>
>> I understand that JeanHeyd has been given enough contradictory guidances
>> that he might be tempted to go with the preprocessor #embed solution.
>> I am concerned that this would not solve anything.
>>
>> The entire value of std::embed is to improve compile time. Anything that
>> would create a node per byte would have disastrous
>> compile time performance. My observation is that parsing is not the
>> bottleneck.
>>
>
> Parsing, validating, and storing a sequence of expressions that
> ultimately is just a comma-delimited, brace-hugged list of numbers creates
> problems so bad that even Static Analysis companies have reached out to me
> in support of #embed and std::embed. The current specification for #embed
> states that it is treated "as if" a brace delimited list of integer-literal
> values is generated. A prudent implementation would generate a builtin for
> this to save on the parsing of exceptionally large arrays: both GCC and
> Clang implementers have stated that this is possible and easy to do (my
> implementation is a toy, and thusly does not do this quite yet).
>
>
>> Sure that might be solvable by an attribute which would change how arrays
>> are represented by the compiler.
>> Even then, the compiler would have to store the entire file's content in
>> memory whereas std::embed can be designed to be backed by a memory mapped
>> file.
>>
>
> Having implemented this in 2 compilers, this was not my experience.
> For example, Clang already has a SourceManager and FileManager which caches
> data from files large and small, and string literals in many cases get
> folded together and interned for speed and memory savings. All the
> implementation needs to do is point to that memory, memory-mapped or not.
> This is the reason for the "as if" language in the wording, to allow
> builtins to take advantage of this or potentially other representations
> (e.g., "A dedicated AST node", as a person present in the EWG discussion of
> #embed put it).
>
>
>> And so a preprocessor based approach would have little value over
>> generated source files and suffer many of the same issue.
>> That it can be pushed through the committee faster should not be a reason
>> to pursue that direction.
>>
>
> I think you are fundamentally misinterpreting the reasons for #embed.
> The goal is to provide a before-Phase-7 (not-constexpr) way for scanners
> and dependency managers that read source code to directly resolve resources
> (typically, files) without potentially requiring full semantic analysis
> (e.g., almost every step of compilation before code generation). This
> enables current-generation distributed build systems to work. A
> preprocessor-based approach provides exactly that, and given the reception
> of P1130 (https://thephd.github.io/vendor/future_cxx/papers/d1130.html)
> by EWG, there was no appetite for a modules-specific syntax. There was also
> no appetite for a special kind of string literal for this; see P1040's
> "prior art" section (
> https://thephd.github.io/vendor/future_cxx/papers/d1040.html#design-prior
> ).
>
> "Push through the Committee" is both presumptuous and extremely
> insulting to my treatment of the Process and the guidance laid out in P0939
> (https://wg21.link/p0939) and P1000 (https://wg21.link/p1000). Expedience
> of Committee acceptance was never a goal; my work has always been thorough,
> user-focused, and pooled from listening to the vast number of users and
> their varied needs. #embed rose out of the need to support a simple +
> intuitive syntax for grabbing file contents in the simplest case ("keep
> simple things simple"), distributed build systems, and current dependency
> scanning tools: there are no other reasons. std::embed has sat in my lap
> for over a year, and despite several e-mails from hobbyist developers to
> U.S. National Lab engineers begging me to make progress, I took the time to
> understand the entirety of the constituency and propose solutions that
> solved their issues. If that is "push through the Committee", then I am not
> sure what other words you need to hear to be convinced otherwise.
>

I want to clarify that in no way i intend to pretend that you were not
diligent in your work and was more referring to the poll taken by sg7. I
just think that despite #embed being simpler and having less dependencies I
personally don't see that as a viable solution. Sorry if I was
misconstrued. The amount of work you did on this proposal commands respect.

> But, words are cheap: perhaps this e-mail showing that I am no longer
> pursuing the area altogether in WG21 will serve as ultimate proof of my
> commitment to the quality, not the speed, of the process.
>
>
>> Security concerns
>> It was always my understanding that embedable resources would be found by
>> a mechanism similar to include paths
>> and as such giving full filesystem access was never really on the table ?
>> Is such a flag insufficient? It would require inspecting what the build
>> system does, rather than individual file.
>> I would be sympathetic to a per-file mechanism to identify resources that
>> can be open but would like decoupled from std::embed
>>
>> ie:
>> #pragma resource "foo.txt"
>> [...]
>> std::embed("foo.txt")
>> We might also want to make all paths relatives and implementations can
>> support a white/blacklist of resource path.
>> I think it's important to support both file and directories - but
>> supporting only files as a first approach seems reasonable.
>>
>
> I am woefully unequipped to answer security concerns. I wish you the
> best in figuring it out.
>
> Tooling
>> A mechanism decoupled of std::embed as in the paragraph above would
>> support the needs of tooling.
>> I don't think tooling support should stop std::embed though.
>> Having to specify dependencies on resources manually is reasonable given
>> that resources should be few
>>
>
> The tooling vendors in the room for std::embed discussion strongly
> disagree with your assertions here. In the current world they can find
> every resource without having to invoke full compilation. std::embed makes
> that hard for them. This was cause for Weakly Against and Strongly Against
> votes; please see the Belfast Wiki of EWG and SG7 Discussion for the
> individuals and contact them directly (or they can self-volunteer
> information here).
>
>
>> Modules
>> To properly supports std::embed and modules, BMIs should keep track of
>> resources path when they are compiled relative to the BMIs, or the source
>> files
>> The idea is that an importing module should have access to the same
>> resources as the imported modules.
>> Alternatively it can be the responsibility of the build system to deal
>> with that.
>>
>
> I am not adept in how modules behave and can offer no reassurances
> here.
>
> More General Solution ?
>> It is unclear to me that std::embed is not already the general solution.
>> It returns a span, which is mean it can be manipulated with any algorithm
>> or view adapter offered by the standard library, most of which are
>> constexpr already.
>> This is strictly more versatile than file which we have fewer tools to
>> handle.
>> In fact i suspect a primary use case for memory mapped file will be to
>> wrap them in span and use them with algorithm.
>>
>> Moreover, the concerns that std::embed has security concerns but is not
>> general enough seem antithetic.
>> I don't think we want write capabilities, or i/o on fd that are not file
>> on disks - more for reproducibility than security reasons.
>> I am also not convinced that the use cases for runtime i/o and compile
>> time/io are the same.
>>
>> Neverther less, i see several solutions:
>>
>> - Making file and mapped_file as proposed by P1031[1] constexpr
>> (partially, we need a small subset), which I believe Niall has been working
>> towards
>> - Continue with std::embed which does not preclude a constexpr
>> stream-like interface on top of a span in the future
>>
>>
>> Trying to blend compile-time and runtime io, seems to me to be only
>> seductive on the surface, but in reality the use cases will be different,
>> and I believe std::embed serves all the use cases people actually care
>> about in practice.
>>
>
> SG7 sees that there is need to solve this problem, but they disagreed
> with the assertion that std::embed is the solution in Belfast. SG7 strictly
> overviews Compile-time Programming and without their support std::embed
> goes nowhere. I also have no direction for continued work, except to try to
> envision a new kind of limited stream that pulls from a potential pool of
> compiler-specified directions. std::resourcestream, or something similar?
> It is unclear what the forward direction is or should be, and I do not have
> the time to join SG7 in discovering a potentially new direction for
> Compile-time Programming. Given the power that std::embed in its current
> form enables and the use cases it was designed to cover from Build Systems
> and seeing the alternatives presented for P1040 in small-room SG7
> discussion, I would rather not move forward in shaky, unsure territory and
> instead will freeze the current proposal progress.
>
> I have no plans on unfreezing it. Feel free to write your own
> paper(s) on the subject which covers the exact same space as std::embed;
> this message should serve as proof to individuals who ask "Did you Consult
> with Author X about P1040, it seems similar?" that you do not need to wait
> or wonder where this paper is going.
>
> Best of Luck,
> JeanHeyd Meneide
>

Received on 2019-11-19 12:35:18