I have a question about the proposal
"Formatted output" with number P2093. Why was
stdout from C chosen to be the default output stream instead of
std::cout from C++? I find it unusual that a new proposal depends on facilities from C instead of C++. It might be a problem if one mixes std::cout and the default std::print (without C++ stream given as argument) when syncing with stdio is off. IMO mixing std::cout and std::print should be a feature that works out of the box without any boilerplate like giving std::cout as argument.
The paper gives couple of arguments in favour of C's stdout, but I'd argue that further investigation is needed. I'll quote some stuff from the paper and I will reply to them.
The paper says:
We propose adding a free function called print with overloads for writing to the standard output (the default) and an explicitly passed output stream object. The default output stream can be either stdout or std::cout. We propose using stdout for the following reasons:
- stdout is considerably faster on at least two major implementations (see § 12 Performance).
The benchmark provided in the paper is not complete. It measures some standard functions like printf, ostream::operator<<(const char *) and 2 calls to fmt::print. The problem is that the benchmark measures the formatted IO part of the C standard library (printf) and the C++ standard library (operator<<), but instead it should be measuring the unformatted IO (fwrite, fputs, std::ostream::write). std::print internally should not depend on the formatted IO functions, but only on the unformatted because its formatting features are different than what printf and cout offer. Because it depends only on the unformatted IO offered by the standard library, that IO should be measured. I have my own benchmarks where I show that C++ unformatted IO is even faster than C's on Linux.
- Better compatibility with other formatted I/O facilities compared to std::cout and its associated std::streambuf that suffer from private buffering, localization and conversion services that must be synchronized at a lower level.
The author mentions private buffering, but C streams (FILE*) also have private buffering. Here are how things work, AFAIK.
- At the lowest level Linux offers file IO with file descriptors, see open(), close(), read(), write(). This IO is unbuffered. Inside the kernel probably there is some form of caching at various levels (file-level, inode-level, level of device blocks etc.), but we don't control that.
- At C level we have the FILE data structure which internally uses the file descriptor and adds buffering on top of it, see setvbuf().
- At C++ level we have std::streambuf and std::filebuf which do the same, they depend direcly on the API with file desctiptors and add its own buffering. Only when sync_with_stdio is true libstdc++ disables the C++ buffering inside std::cout and depends on the C buffering in stdout.
C++ streambuf is not localized by default. It holds a locale object because it depends on codecvt, but the default codecvt<char, char, mbstate_t> always returns noconv. wstreambuf is a different story. C streams also have its encoding conversions built-in if we mix wchar_t, for example.
printf("%ls\n", L"wide string"); // has encoding from wide to narrow multibyte // during string formatting
wprintf(L"ABC\n"); // has encoding from wide to narrow
// before sending to stdout from the OS
wprintf(L"%s\n", "narrow string"); // has encoding from narrow to wide
// during string formatting and again
// from wide to narrow before sending
// to the OS
How std::print behves when wchar_t is involved maybe should be different discussion. My focus here is the performance aspect.
print won’t use any formatted output functionality of ostream.
It can still depend only on the unformatted IO and be fast. Implementations of std::print can even grab the std::streambuf inside ostream and work directly with it to remove the overhead of std::ostream::write.
With all this said, maybe the real reason why stdout was chosen is because one can do the trick on Windows with GetConsoleMode(_get_osfhandle(_fileno(stream)), ...) only on C's stdout. But that is not a problem for standard libraries, it is only for the library fmt.
I will now show the benchmark with a few different invocations.
// Filename: cout.cpp
#include <cstdio>
#include <iostream>
#include <benchmark/benchmark.h>
void bm_printf_with_number(benchmark::State& s) { while (s.KeepRunning())
std::printf("The answer is %d.\n", 42);}
BENCHMARK(bm_printf_with_number);
void bm_printf_with_string(benchmark::State& s) { while (s.KeepRunning())
std::printf("The answer is 42.\n");}
BENCHMARK(bm_printf_with_string);
void bm_fwrite(benchmark::State& s) { const char str[] = "The answer is 42.\n";
while (s.KeepRunning())
std::fwrite(str, sizeof(str)-1, 1, stdout);
}
BENCHMARK(bm_fwrite);
void bm_fwrite_2(benchmark::State& s) { const char str[] = "The answer is 42.\n";
while (s.KeepRunning())
std::fwrite(str, 1, sizeof(str)-1, stdout);
}
BENCHMARK(bm_fwrite_2);
void bm_fputs(benchmark::State& s) { const char str[] = "The answer is 42.\n";
while (s.KeepRunning())
std::fputs(str, stdout);
}
BENCHMARK(bm_fputs);
void bm_puts(benchmark::State& s) { const char str[] = "The answer is 42.";
while (s.KeepRunning())
std::puts(str);
}
BENCHMARK(bm_puts);
void bm_ostream_with_number(benchmark::State& s) { std::ios::sync_with_stdio(false);
while (s.KeepRunning())
std::cout << "The answer is " << 42 << ".\n";
}
BENCHMARK(bm_ostream_with_number);
void bm_ostream_with_string(benchmark::State& s) { std::ios::sync_with_stdio(false);
while (s.KeepRunning())
std::cout << "The answer is 42.\n";
}
BENCHMARK(bm_ostream_with_string);
void bm_ostream_write(benchmark::State& s) { std::ios::sync_with_stdio(false);
const char str[] = "The answer is 42.\n";
while (s.KeepRunning())
std::cout.write(str, sizeof(str) - 1);
}
BENCHMARK(bm_ostream_write);
BENCHMARK_MAIN();
// End file.
Compile with:
g++ -O2 cout.cpp -lbenchmark
Run with:
./a.out --benchmark_out=result.txt --benchmark_out_format=console > /dev/null && cat result.txt
./a.out --benchmark_out=result.txt --benchmark_out_format=console && cat result.txt
./a.out --benchmark_out=result.txt --benchmark_out_format=console > temp.txt && cat result.txt && rm temp.txt
./a.out --benchmark_out=result.txt --benchmark_out_format=console | grep -v "^T" && cat result.txt
The result will vary but should show that C++ cout::write with sync_with_stdio set to false is either the fastest or competetive with fwrite. Please run this benchmark on various platforms that you work and report.
Best,
Dimitrij.