C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] Namespaces
From: Steve Downey (sdowney_at_[hidden])
Date: 2019-04-12 15:45:50


I'm not placing ECS and WECS?

I'd also like to see it be possible for vendors to supply extended
encodings, as they do for locale today. We might want to stash those in a
sub-namespace. In particular, while not mandating it, I would like to
encourage vendors to supply the whatwg user agent required list. Although
if the spec is open for extension in the right direction, it becomes less
critical, as someone else can provide the library. Locale, as specified, is
only usefully extensible by the vendor. Posix does provide extension
mechanisms, but neither C or C++ is required to use that.

On Fri, Apr 12, 2019 at 3:00 PM Lyberta <lyberta_at_[hidden]> wrote:

> JeanHeyd Meneide:
> > On Fri, Apr 12, 2019 at 6:45 AM Lyberta <lyberta_at_[hidden]> wrote:
> >
> >> I guess at least teachability and clean structure. The guidance would
> >> be: "stuff in std is old and unusable for text, stuff in std::text is
> >> new and usable".
>
> Previously it was suggested to focus on Unicode so I no longer propose
> std::text namespace but I think we should put Unicode into std::unicode.
>
> It was also suggested to only provide proper API for Unicode and support
> other character sets via transcoding. Hence maybe instead of std::text
> we should have std::unicode::text.
>
> In particular, I don't think this is a good idea to support holding ECS
> or WECS inside std::[unicode::]text.
>
> > The sub-namespace isn't really necessary here because we are not in
> > competition for certain names or algorithms, save for the 3 names I want
> to
> > specifically name `std::text_decode`, `std::text_encode`,
> > `std::text_transcode`, and similar because it internally implies other
> > semantics and I do not want to steal the names `decode` and `encode` when
> > those are much more broad terms.
>
> Is there a proposal for those?
>
> >
> > Other names such as `std::utf8`, `std::utf32`, `std::utf16`,
> > `std::wide_execution`, and `std::narrow_execution` are fairly specific to
> > the text domain and I don't see them clashing.
>
> Maybe a bit offtopic but I don't think std::narrow_execution and
> std::wide_execution are good names. I think appending _character_set
> would make them less ambiguous.
>
> > Regarding earlier points on what the standard does provide: the standard
> > needs to provide encodings for all the encoding types that are
> (currently)
> > pushed out by the standard, and nothing more. This includes: std::utf8,
> > std::utf16, std::utf32, std::wide_execution, and std::narrow_execution.
>
> I agree but I want to stress that this would be a good idea to provide
> only minimal support for ECS and WECS (i.e. transcoding only) and just
> let users migrate to Unicode.
>
> > The
> > standard should not vend any other encodings, but the Encoding and
> Decoding
> > interfaces should be standard -- much like Allocator -- that allows a
> user
> > to swap in their own class type and object that replaces the use of an
> > encoding in any interface / function standard templates provide. (Similar
> > to char_traits, except not as useless.) This means users can employ
> > whatever encoding or power they have under the hood and enjoy fast and
> > correct text processing so long as they follow the required semantics.
>
> Again, it's been suggested to provide full-fledged API to Unicode only.
>
> > Note that we cannot only ship utf8 as an encoding, because the standard
> > already ships and acknowledges more than utf8 as one of the encoding for
> > string literals. It would be highly dysfunctional to have utf16 string
> > literals that the standard library itself cannot process in a reasonable
> > manner.
>
> Yes, supporting UTF-16 and UTF-32 is trivial if algorithms work in terms
> of scalar values (which they should).
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>



SG16 list run by sg16-owner@lists.isocpp.org