C++ Logo

sg16

Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: Davis Herring <herring_at_[hidden]>
Date: Thu, 9 Jun 2022 10:31:32 -0600
> I still feel that option 1 is overly specific in requiring all input
> files to be sequences of integers. It's not something we need to
> specify, so we shouldn't.

At the risk of repeating something I think I said a few teleconferences
ago, I think it's even a bit more fundamental than that: I don't think
it's _useful_ to specify that input files are sequences of integers.
What we want to mean by that is that the input file is the sequence of
bytes returned from open(2) and read(2), but we have no means of
referring to such APIs, and so saying that there is _some_ sequence of
integers and the implementation must support such a sequence if it
encodes code points via UTF-8 is toothless. I can always say that the
sequence of integers that is the user's source file is the result of
encoding into UTF-8 my decoding of the "actual file" as Windows-1252
(somewhat like record-oriented files under VMS), defeating the point of
the proposal.

It's surely impossible to actually define this problem out of existence,
but I think Hubert's suggestion already mentioned is as close as it is
possible to get.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or 
too sparse, it is because mass-energy conversion has occurred during 
shipping.

Received on 2022-06-09 16:31:37