C++ Logo


Advanced search

Re: [SG16-Unicode] P1689: Encoding of filenames for interchange

From: Lyberta <lyberta_at_[hidden]>
Date: Sat, 07 Sep 2019 15:26:00 +0000
Thiago Macieira:
> On Friday, 6 September 2019 19:17:00 PDT Lyberta wrote:
>> I think if the machine-readable output depends on locale, the author of
>> the program seriously messed up.
> Oh, I agree with you. The problem is that the standard C library (as extended
> by POSIX) does not provide the API to make that happen *and* support
> internationalisation. And that's assuming the tool even have a "machine
> readable" format in the first place. In the Unix tradition, you just scrape
> the output of tools.

Then don't use standard C library. On POSIX use open(), read() and
write(), have your own Unicode layer on top and read/write UTF-8 JSON if
you want to output anything machine-readable.

There is no such thing as plain text and Unix philosophy is dead.

I have a C++ proposal for binary IO/serialization here:


It was already reviewed by Niall twice and hopefully by C++23 we'll have
sane binary IO in the standard. I don't have to plans to fix C at this
point though because it doesn't have an analog of std::byte yet.

> But the input is not Unicode, it's file paths. On Unix, it is possible to pass
> binary input in the command-line. With some effort, you can even pass NULs to
> specially crafted receiver applications. The std::filesystem API appears to
> have a way to retrieve the native raw format, which some application may need.

Yeah, it's all because C decided to have char as both bytes and
characters and doomed us all for ~50 years of pain. We need to decide if
main() should get text or characters and fix it.

Received on 2019-09-07 17:26:19