C++ Logo


Advanced search

Re: [EXTERNAL] Re: [isocpp-lib] Why have we deprecated filesystem::u8path?

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Tue, 29 Nov 2022 17:02:01 +0100
On Tue, Nov 29, 2022 at 4:59 PM Nicole Mazzuca via SG16 <
sg16_at_[hidden]> wrote:

> I think it was a noble idea, but fundamentally a non-zero number of people
> use `string` as a utf-8 container, and cannot switch. It should not be
> considered the default, but it should certainly be supported without
> allocation to a completely different type (or a reinterpret_cast).

Note that I'm working on a non-allocating, well defined conversation
It does require core ifolks nvolvement so I'm hoping we can make progress
on that sometimes next year, Issaquah hopefully,

> Nicole
> Sent from my iPhone
> On Nov 29, 2022, at 07:45, Tom Honermann <tom_at_[hidden]> wrote:
> You don't often get email from tom_at_honermann.net. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
> Sorry for the delay in responding.
> u8path was deprecated with the adoption of P0482R6
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwg21.link%2Fp0482r6&data=05%7C01%7CNicole.Mazzuca%40microsoft.com%7C258b03079d85402ba9a408dad220cbc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638053335557071006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=r2NfZ4E5lX4N6Occ%2FWidIk12RdCDv2kVRzH79zZGS%2B4%3D&reserved=0>.
> I confirmed that I neglected to include motivation for its deprecation in
> that paper. The closest the paper gets to such motivation is in the
> discussion of u8path in the Motivation
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwg21.link%2Fp0482r6%23motivation&data=05%7C01%7CNicole.Mazzuca%40microsoft.com%7C258b03079d85402ba9a408dad220cbc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638053335557071006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A0lnf2Vv6oj7UlTtEWPJ9MEeTeXymcwkmONrNH5e0B0%3D&reserved=0>
> section:
> To accommodate UTF-8 encoded text, the file system library specifies the
> following factory functions. Matching factory functions are not provided
> for other encodings.
> template <class Source>path u8path(const Source& source);template <class InputIterator>path u8path(InputIterator first, InputIterator last);
> The requirement to construct path objects using one interface for UTF-8
> strings vs another interface for all other supported encodings creates
> unnecessary difficulties for portable code. Consider an application that
> uses UTF-8 as its internal encoding on POSIX systems, but uses UTF-16 on
> Windows. Conditional compilation or other abstractions must be implemented
> and used in otherwise platform neutral code to construct path objects.
> The original motivation for deprecation was that u8path was only added
> because the path constructor, per [fs.path.type.cvt]
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Ffs.path.type.cvt&data=05%7C01%7CNicole.Mazzuca%40microsoft.com%7C258b03079d85402ba9a408dad220cbc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638053335557071006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VZ8hGrtxmK2G3AIax%2FOWeEZCZGegD72WY4ra%2FDH1Js0%3D&reserved=0>,
> already specified different behavior for construction via a range of char;
> u8path therefore provided redundant functionality once char8_t was added.
> I think deprecation is still justified on design grounds. The standard
> currently associates the following encodings with char:
> 1. The *ordinary literal encoding* ([lex.ccon.literal]
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Ftab%3Alex.ccon.literal&data=05%7C01%7CNicole.Mazzuca%40microsoft.com%7C258b03079d85402ba9a408dad220cbc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638053335557071006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FkoDA6WskXmmL7BPDSDbSQrYNROMlWsq8d9fAn8B7Jw%3D&reserved=0>,
> [lex.string.literal]
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Ftab%3Alex.string.literal&data=05%7C01%7CNicole.Mazzuca%40microsoft.com%7C258b03079d85402ba9a408dad220cbc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638053335557071006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uAGuPJysBUp0bp2KT%2FwZKOhQjrd699yWFPodPQJMuYM%3D&reserved=0>)
> used for character and string literals.
> 2. The *execution character set* ([character.seq.general]p(1.2)
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Flibrary%23character.seq.general-1.2&data=05%7C01%7CNicole.Mazzuca%40microsoft.com%7C258b03079d85402ba9a408dad220cbc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638053335557071006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tZy3GWrooaMSYlCWKmJAN%2BTPHfpmBug7aFFQ6mFGs9I%3D&reserved=0>)
> used for the locale dependent execution environment.
> 3. The multibyte character encoding ([c.mb.wcs]
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Fc.mb.wcs&data=05%7C01%7CNicole.Mazzuca%40microsoft.com%7C258b03079d85402ba9a408dad220cbc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638053335557071006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ch6h1YEk8BmItsljatl3QMePHITu20COznJlK01MCUk%3D&reserved=0>,
> C: Multibyte characters) which is effectively the encoding of the *execution
> character set*.
> 4. The *native encoding* ([fs.path.type.cvt]p1
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Ffs.path.type.cvt%231&data=05%7C01%7CNicole.Mazzuca%40microsoft.com%7C258b03079d85402ba9a408dad220cbc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638053335557071006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=q%2F0MXCfAbDMFzF2RcOHhfnWpuP%2FWm4OinSMKSiFFgr0%3D&reserved=0>)
> used for path names.
> Though the standard doesn't require it, the intent is that these encodings
> are all compatible. In practice, they do get out of sync; the locale of the
> execution environment is not generally known when encoding character and
> string literals and filesystem encoding may differ from the locale
> dependent encoding.
> Adding an additional association with UTF-8 creates a deeper division. We
> know that programmers have a hard time maintaining encoding expectations;
> mojibake remains a common occurrence. From a design perspective, if we
> endorse continued use of u8path, should we also add char-based UTF-8
> specific variants of std::basic_string, std::char_traits, and std::ctype?
> It isn't clear to me that path names are sufficiently special to warrant
> special interfaces; particularly when most filesystems in use today (NTFS
> being a partial exception) do not require a particular encoding (most just
> require a specific value for the '/' and '\0' characters). As we seek to
> add more Unicode features to the standard library, should we add UTF-8
> based interfaces for char and char8_t (and unsigned char since some
> projects use that for UTF-8)? I think the standard should avoid further
> muddying the waters of what encoding(s) char should be associated with.
> Tom.
> On 11/29/22 1:08 AM, Daniel Krügler wrote:
> Am Di., 29. Nov. 2022 um 05:32 Uhr schrieb Nicole Mazzuca<Nicole.Mazzuca_at_[hidden]> <Nicole.Mazzuca_at_[hidden]>:
> I'd point out that the exact same issue exists with path(u8string), we've just made life more painful for people who do need to convert utf-8 to paths. (i.e., Windows people).
> Nicole
> Thanks for all the feedback, Nicole, Steve, and Casey. I will now open
> an LWG issue about this.
> - Daniel
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2022-11-29 16:02:16