C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] Do we really need basic_text_view?
From: Tom Honermann (tom_at_[hidden])
Date: 2018-08-04 14:39:29


On 08/03/2018 10:41 PM, Lyberta wrote:

>
>> I think the type aliases are useful for non-deduced contexts.  For
>> example, when declaring function parameters.
> Right, then we need some good names. I think we should break the
> convention established by basic_string. I suggest these:
>
> ecs_text_view, wecs_text_view, utf8_text_view, utf16_text_view,
> utf32_text_view. That is assuming the paper that establishes UTF-16 and
> UTF-32 as encoding for char16/32_t literals is accepted.

Strong motivation would be needed to break with existing conventions. 
Support for CTAD might be enough to consider renaming 'basic_text_view'
to 'text_view' and renaming the 'text_view' type alias to 'ntext_view',
but I think such naming decisions should be made with LEWG guidance.  I
don't see motivation for breaking with the common 'w', 'u8', 'u16', and
'u32' prefixed names.

>
>> I don't think it is feasible to avoid the execution character encoding
>> given that it is the encoding used for I/O.  Eventually, we may be able
>> to add I/O interfaces that implicitly transcode at program boundaries,
>> but we don't have that yet.  I think beginners should be able to write
>> hello world without having to (explicitly) deal with transcoding.  For
>> many applications, the execution character encoding is the right
>> encoding to target.
> I think we should carefully consider what a modern I/O library should
> look like and then design for it. I think I/O should be in terms of
> std::byte. I hope integers will be 2s complement soon so serialization
> of integers won't be a problem. Since code units are just integers, we
> should just work on top of that.

I don't think redefining I/O in terms of std::byte would help solve text
related problems.  For console based programs, stdin and stdout will
continue to have an associated encoding that is necessarily determined
(for interoperability purposes) by the environment the program is
running in.  We could, of course, design an I/O library that implicitly
transcodes from the externally determined encoding to a program
determined internal encoding.  Whether that would be a good thing to do
or not is not something I've developed strong opinions about yet.  There
are significant challenges here since native I/O on most platforms uses
the execution character encoding, but Windows' native I/O uses the wide
execution character encoding (narrow interfaces implicitly transcode; in
ways that don't always work as expected).  Bridging these differences
may require defining a "native" or "system" encoding that is used for
stdin, stdout, environment variables, command line options, etc... 
Separate encodings may be necessary for file names and text file
contents since those may differ from other I/O.

Tom.


SG16 list run by herb.sutter at gmail.com