Date: Tue, 10 Sep 2024 00:47:14 -0400
On 9/10/24 12:17 AM, Tiago Freire wrote:
> It is extremely relevant.
Clearly we disagree on that.
>
> Let's suppose it did as you suggested, and now we added a transcoder
> (which is yet to be made progress on), What happens when the input
> cannot be transcoded to utf8? (which is currently possible on all
> operating systems).
I mentioned earlier that the proposal needs to address how transcoding
errors are handled.
>
> I don't see any other thing that the application could validly do at
> that point other than silently fail and exit.
Other options include replacing invalid code unit sequences with
substitution characters.
>
> When previously this was never a concern, now there's a category of
> inputs that is perfectly valid to spawn applications with but invalid
> for your application to run with.
It is perfectly possible to spawn a process with command line arguments
that are not validly encoded, encoded for an encoding other than what
the program expects, or just plain not text. The transcoding layer
doesn't introduce a problem that wasn't already there; it just applies
an error handling mechanism of some sort for certain error conditions.
The proposal doesn't include deprecation of the existing main()
signature. This would be an opt-in feature only suitable for use in
situations where the programmer deems its error handling mechanisms as
suitable for the program.
Programmers are always free to reject input that doesn't meet program
specifications in whatever way they feel is appropriate.
>
> But let's say you don't like making applications that immediately quit
> as soon as you start them, and you would try to define some fail over
> behavior like filter out invalid input or replace offending codepoints
> with a placeholder. In that case you better hope that the offending
> input isn't a path to a file, because in that case there's no way your
> application will be able to access it.
I would argue that it is a requirement of the standard library to
facilitate access to files by whatever name they are given. However, an
application need not provide support for all such names. I don't find it
unreasonable for a program to require that files it works with have a
name that is uniquely representable in UTF-8.
>
>
> It all sounds good until you have to solve practical problems.
The proposal as offered does not match my preferred approach.
Personally, I favor an approach that facilitates both raw access and
access in common Unicode encodings with error handling facilities. I'm
just pushing back because I don't find the arguments you have presented
against the solution proposed in this thread compelling or relevant. The
only way in which the Unicode Standard is relevant to this proposal is
that it defines the UTF-8 encoding. If you were to substitute a
non-Unicode encoding for UTF-8 in what is proposed, then the Unicode
Standard would be completely irrelevant.
Tom.
>
>
> ------------------------------------------------------------------------
> *From:* Tom Honermann <tom_at_[hidden]>
> *Sent:* Monday, September 9, 2024 11:48:42 PM
> *To:* Tiago Freire <tmiguelf_at_[hidden]>
> *Cc:* std-proposals_at_[hidden] <std-proposals_at_[hidden]>
> *Subject:* Re: [std-proposals] Floating an idea: int
> main(std::span<std::string_view> args)
>
>
>> On Sep 9, 2024, at 12:34 AM, Tiago Freire <tmiguelf_at_[hidden]> wrote:
>>
>>
>> It means, file system is not Unicode, terminal inputs are not
>> Unicode, system understanding of text (wherever that shows up) is not
>> Unicode, even in modes labeled "utf-8" or suggesting that they are
>> "Unicode".
>
> None of that is relevant for this proposal. The proposal wasn’t to
> support everything in the Unicode Standard; it was to support UTF-8.
>
>>
>> Any attempt to add such a feature would only achieve making an
>> application that uses it not compatible with any OS.
>
> That is just incorrect. The native OS program arguments will be
> affiliated with some implementation-defined encoding. Adding a
> transcoding layer on top of them does not introduce incompatibility.
> Programmers still have to adhere to encoding expectations of any
> functions called.
>
> Tom.
>
>>
>> ------------------------------------------------------------------------
>> *From:* Tom Honermann <tom_at_[hidden]>
>> *Sent:* Sunday, September 8, 2024 11:22:45 PM
>> *To:* std-proposals_at_[hidden] <std-proposals_at_[hidden]>
>> *Cc:* Tiago Freire <tmiguelf_at_[hidden]>
>> *Subject:* Re: [std-proposals] Floating an idea: int
>> main(std::span<std::string_view> args)
>>
>>
>>> On Sep 8, 2024, at 3:42 PM, Tiago Freire via Std-Proposals
>>> <std-proposals_at_[hidden]> wrote:
>>>
>>>
>>> That would actually make things worse, as no major OS vendor is
>>> Unicode compliant.
>>
>> I fail to see what an OS being Unicode compliant (whatever that
>> means; there is no Unicode specification for operating systems) has
>> to do with this.
>>
>> As long as the implementation knows what encoding to convert from, a
>> signature that provides a UTF-8 interface is possible. However, since
>> the external input might not be properly encoded, such a proposal
>> should specify how transcoding errors are handled.
>>
>> Implicitly linking to an alternate file I/O library doesn’t seem
>> realistic to me. Such a change in encoding expectations should be
>> expressed such that every translation unit is aware. Otherwise, a
>> programmer has no way of knowing what encoding to use with such
>> interfaces.
>>
>> Tom.
>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> on
>>> behalf of zxuiji via Std-Proposals <std-proposals_at_[hidden]>
>>> *Sent:* Sunday, September 8, 2024 9:14:51 PM
>>> *To:* std-proposals_at_[hidden] <std-proposals_at_[hidden]>
>>> *Cc:* zxuiji <gb2985_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Floating an idea: int
>>> main(std::span<std::string_view> args)
>>>
>>> I would rather propose int umain( int argc, char8_t **argv ) { ... }
>>>
>>> The presence of which would indicate to the compiler to slap some
>>> extra code into the startup function to active UTF8 mode in the
>>> terminal and link variants of open() etc that expect UTF8 paths.
>>> This would vastly simplify development if the interactions between
>>> the system and the program have translations handled in the
>>> background instead of forcing it onto the developer.
>>>
>>> On Sat, 7 Sept 2024 at 22:41, Andrey Semashev via Std-Proposals
>>> <std-proposals_at_[hidden]> wrote:
>>>
>>> On 9/8/24 00:21, Thiago Macieira via Std-Proposals wrote:
>>> > On Saturday 7 September 2024 22:11:32 CEST Jeremy Rifkin via
>>> Std-Proposals
>>> > wrote:
>>> >> If I understand correctly P0781 was written before std::span was
>>> >> proposed and suggested some magic std::argument_list. Now
>>> that there's a
>>> >> standard replacement for pointer+length, is it worth
>>> reconsidering?
>>> >
>>> > std::span requires that there be a contiguous range of the
>>> value_types in
>>> > memory somewhere. That's the problem here: std::string_view
>>> aren't there. We
>>> > could use a std::span<std::cstring_view> but no one wants to
>>> standardise
>>> > cstring_view.
>>>
>>> cstring_view also aren't there, unless you're willing to mandate its
>>> binary representation as a single pointer and legalize type punning.
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>
> It is extremely relevant.
Clearly we disagree on that.
>
> Let's suppose it did as you suggested, and now we added a transcoder
> (which is yet to be made progress on), What happens when the input
> cannot be transcoded to utf8? (which is currently possible on all
> operating systems).
I mentioned earlier that the proposal needs to address how transcoding
errors are handled.
>
> I don't see any other thing that the application could validly do at
> that point other than silently fail and exit.
Other options include replacing invalid code unit sequences with
substitution characters.
>
> When previously this was never a concern, now there's a category of
> inputs that is perfectly valid to spawn applications with but invalid
> for your application to run with.
It is perfectly possible to spawn a process with command line arguments
that are not validly encoded, encoded for an encoding other than what
the program expects, or just plain not text. The transcoding layer
doesn't introduce a problem that wasn't already there; it just applies
an error handling mechanism of some sort for certain error conditions.
The proposal doesn't include deprecation of the existing main()
signature. This would be an opt-in feature only suitable for use in
situations where the programmer deems its error handling mechanisms as
suitable for the program.
Programmers are always free to reject input that doesn't meet program
specifications in whatever way they feel is appropriate.
>
> But let's say you don't like making applications that immediately quit
> as soon as you start them, and you would try to define some fail over
> behavior like filter out invalid input or replace offending codepoints
> with a placeholder. In that case you better hope that the offending
> input isn't a path to a file, because in that case there's no way your
> application will be able to access it.
I would argue that it is a requirement of the standard library to
facilitate access to files by whatever name they are given. However, an
application need not provide support for all such names. I don't find it
unreasonable for a program to require that files it works with have a
name that is uniquely representable in UTF-8.
>
>
> It all sounds good until you have to solve practical problems.
The proposal as offered does not match my preferred approach.
Personally, I favor an approach that facilitates both raw access and
access in common Unicode encodings with error handling facilities. I'm
just pushing back because I don't find the arguments you have presented
against the solution proposed in this thread compelling or relevant. The
only way in which the Unicode Standard is relevant to this proposal is
that it defines the UTF-8 encoding. If you were to substitute a
non-Unicode encoding for UTF-8 in what is proposed, then the Unicode
Standard would be completely irrelevant.
Tom.
>
>
> ------------------------------------------------------------------------
> *From:* Tom Honermann <tom_at_[hidden]>
> *Sent:* Monday, September 9, 2024 11:48:42 PM
> *To:* Tiago Freire <tmiguelf_at_[hidden]>
> *Cc:* std-proposals_at_[hidden] <std-proposals_at_[hidden]>
> *Subject:* Re: [std-proposals] Floating an idea: int
> main(std::span<std::string_view> args)
>
>
>> On Sep 9, 2024, at 12:34 AM, Tiago Freire <tmiguelf_at_[hidden]> wrote:
>>
>>
>> It means, file system is not Unicode, terminal inputs are not
>> Unicode, system understanding of text (wherever that shows up) is not
>> Unicode, even in modes labeled "utf-8" or suggesting that they are
>> "Unicode".
>
> None of that is relevant for this proposal. The proposal wasn’t to
> support everything in the Unicode Standard; it was to support UTF-8.
>
>>
>> Any attempt to add such a feature would only achieve making an
>> application that uses it not compatible with any OS.
>
> That is just incorrect. The native OS program arguments will be
> affiliated with some implementation-defined encoding. Adding a
> transcoding layer on top of them does not introduce incompatibility.
> Programmers still have to adhere to encoding expectations of any
> functions called.
>
> Tom.
>
>>
>> ------------------------------------------------------------------------
>> *From:* Tom Honermann <tom_at_[hidden]>
>> *Sent:* Sunday, September 8, 2024 11:22:45 PM
>> *To:* std-proposals_at_[hidden] <std-proposals_at_[hidden]>
>> *Cc:* Tiago Freire <tmiguelf_at_[hidden]>
>> *Subject:* Re: [std-proposals] Floating an idea: int
>> main(std::span<std::string_view> args)
>>
>>
>>> On Sep 8, 2024, at 3:42 PM, Tiago Freire via Std-Proposals
>>> <std-proposals_at_[hidden]> wrote:
>>>
>>>
>>> That would actually make things worse, as no major OS vendor is
>>> Unicode compliant.
>>
>> I fail to see what an OS being Unicode compliant (whatever that
>> means; there is no Unicode specification for operating systems) has
>> to do with this.
>>
>> As long as the implementation knows what encoding to convert from, a
>> signature that provides a UTF-8 interface is possible. However, since
>> the external input might not be properly encoded, such a proposal
>> should specify how transcoding errors are handled.
>>
>> Implicitly linking to an alternate file I/O library doesn’t seem
>> realistic to me. Such a change in encoding expectations should be
>> expressed such that every translation unit is aware. Otherwise, a
>> programmer has no way of knowing what encoding to use with such
>> interfaces.
>>
>> Tom.
>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> on
>>> behalf of zxuiji via Std-Proposals <std-proposals_at_[hidden]>
>>> *Sent:* Sunday, September 8, 2024 9:14:51 PM
>>> *To:* std-proposals_at_[hidden] <std-proposals_at_[hidden]>
>>> *Cc:* zxuiji <gb2985_at_[hidden]>
>>> *Subject:* Re: [std-proposals] Floating an idea: int
>>> main(std::span<std::string_view> args)
>>>
>>> I would rather propose int umain( int argc, char8_t **argv ) { ... }
>>>
>>> The presence of which would indicate to the compiler to slap some
>>> extra code into the startup function to active UTF8 mode in the
>>> terminal and link variants of open() etc that expect UTF8 paths.
>>> This would vastly simplify development if the interactions between
>>> the system and the program have translations handled in the
>>> background instead of forcing it onto the developer.
>>>
>>> On Sat, 7 Sept 2024 at 22:41, Andrey Semashev via Std-Proposals
>>> <std-proposals_at_[hidden]> wrote:
>>>
>>> On 9/8/24 00:21, Thiago Macieira via Std-Proposals wrote:
>>> > On Saturday 7 September 2024 22:11:32 CEST Jeremy Rifkin via
>>> Std-Proposals
>>> > wrote:
>>> >> If I understand correctly P0781 was written before std::span was
>>> >> proposed and suggested some magic std::argument_list. Now
>>> that there's a
>>> >> standard replacement for pointer+length, is it worth
>>> reconsidering?
>>> >
>>> > std::span requires that there be a contiguous range of the
>>> value_types in
>>> > memory somewhere. That's the problem here: std::string_view
>>> aren't there. We
>>> > could use a std::span<std::cstring_view> but no one wants to
>>> standardise
>>> > cstring_view.
>>>
>>> cstring_view also aren't there, unless you're willing to mandate its
>>> binary representation as a single pointer and legalize type punning.
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>>
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>
Received on 2024-09-10 04:47:20