ISOCPP std-proposals List: Re: [std-proposals] Add an interface to std::fstream::open to open unnamed file wb+

From: Thiago Macieira <thiago_at_[hidden]>
Date: Tue, 25 Apr 2023 20:35:31 -0700

On Tuesday, 25 April 2023 18:28:11 PDT Louis Langholtz via Std-Proposals
wrote:
> Where path might be: std::filesystem::temp_directory_path() and with the
> prefix like std::ios_base:: elided for exposition. This would request
> creating the new file exclusively and in an unlinked state basically. On
> O_TMPFILE supporting systems this would exactly be what’s documented for
> O_TMPFILE (see https://man7.org/linux/man-pages/man2/open.2.html). Else,
> the libc++ implementation would fallback (to providing as similar as
> possible like by separately calling unlink for file). Similarly, without
> the C++23 “noreplace” flag would mean not specifying O_EXCL in conjunction
> with O_TMPFILE and allowing the resulting file to be linked back into the
> filesystem - i.e. “materialized” back into the filesystem like you’ve
> mentioned.

Please note Linux only allows materialisation of O_TMPFILE files in their
initial state. You can't relink a file that was unlnked, including an O_TMPFILE
file that was previously materialised. So the trick of unlinking a file just
after creating it only works if the file is never going to have a name, whether
the file is ephemeral or not. And other OSes don't have anything similar.

That means you couldn't use the std::ios::tmpfile flag as you've described to
create a file that may eventually have a name, either because it needs to so
another process can reach it (and fd-passing is not an option), or because
this is the atomic-rename-to-save scenario.

I also don't think you can withdraw the FILE_FLAG_DELETE_ON_CLOSE on Windows
after the file is created.

This means that you need a different facility for "unnamed temporary file" and
for "file in this directory whose name I don't care right now, but I need to
know it".

> Should a change to additionally support “materialization” of a temporary
> file be in a separate proposal or the same proposal? I was of the opinion
> that because they’re independent, doing separate follow-on proposal would
> be easier. I don’t want to get side-tracked on the pros-and-cons of
> different ways of possibly supporting “materialization” but for the sake of
> completeness, I had mentioned adding a “rename_to” function to std::fstream
> for this as well as renaming files opened by an fstream object.

For QTemporaryFile, materialisation is completely transparent to the user. The
object behaves as if it had had a name all along.

But the implementation may not be acceptable for the C++ Standard Library,
because such an operation may fail. It's very unlikely because the directory
entry exists, removing the ENOSPC error, and our random generator is very
good, but the possibility is still there. This is a delayed error, in an
operation that the Standard Library would probably design as "cannot fail".

More likely, we need the two facilities from above:

1) file does not need now or ever a name and will definitely not survive close()
This is the same use-case as tmpfile(). It can use O_TMPFILE on Linux, unlink()
on other Unix, and FILE_FLAG_DELETE_ON_CLOSE for Windows.

2) file will possibly need a name in the future and/or may not be deleted
This is the original QTemporaryFile use-case. It can't use
FILE_FLAG_DELETE_ON_CLOSE and can't be unlink()ed before returning. The
ability to use O_TMPFILE depends on whether the "what's the file name" getter
can have error conditions and may perform I/O.

I find use-case #1 very limited and again I speak from experience. Such
temporary files are at best a scratch / swap space for the application. For the
application to believe it needs a file instead of just using memory, it must
believe the amount of data it needs to store is considerable. The other
scenario is when it needs a file API for some reason, though for std::fstream
this is a slightly unlikely scenario because APIs should be taking
std::istream and std::ostream instead.

Use-case #2 is required for QSaveFile to be implemented and it's what the vast
majority of applications need, because they converse with one another using
file names, not by passing file descriptors.

> Now the question is, can we come up with a proposed change to the C++
> standard that alleviates classes like QTemporaryFile from having to have so
> much infrastructure in them to get its functionality?

Yes and no. I think that there's a reasonable API to be proposed to the
standard (see above).

But no, I am not going to use std::fstream in Qt. std::filesystem we're
beginning to see some benefit, in particular std::filesystem::path{,_view}. But
iostreams have very little appeal for us and that's not likely to change,
ever.

> Wouldn’t the flow for creating a new executable be like:
>
> 1. Create temporary file in same filesystem as executable & opened for
> writing. Directory matters but file component of the path doesn’t.
> 2. Write out the executable to the temporary file.
> 3. Rename the temporary file to the target name of the executable. Full
> pathname matters.

Not always, no. Sometimes, you don't need a specific name, but you need a name
that conforms to a rule. There's a general practice to give temporary files
their real extensions. One use-case I can come up with right now is to use
GNU tar's ability to choose the decompressor based on file name if you just use
-f (I never use the -z or -J or --zstd options, for example).

The use-case that I was talking about of an executable is a real one for me:
at work we have this tool that comes in two builds: one compiled with AVX code
for performance and one without. On macOS, you'd just use a fat binary with
two slices (one x86_64 and one x86_64h), but we needed the solution for Linux
and Windows. So I coded a simple wrapper that embedded both files (something
#embed would have helped!) and selected which one to run at runtime after
querying CPUID.

In this scenario, on Windows:
a) the name of the file does not matter, so long as it ends in .exe
b) the name needs to be selected randomly, because another instance of the
tool may already be extant on the filesystem
c) the file must be closed after writing before execve() or CreateProcess
d) the file can't be deleted until the child process has exited

Because the file needed to survive close(), it needed a real name, so O_TMPFILE
didn't help much even on Linux. This also prevented me from using a memfd,
because those disappear when you close.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel DCAI Cloud Engineering

Received on 2023-04-26 03:35:34