Date: Fri, 28 May 2021 09:01:55 +0200
On Fri, May 28, 2021, 02:22 Steve Downey via SG16 <sg16_at_[hidden]>
wrote:
> IFNDR is definitely too far. I think even weak library UB is too far. The
> function behaves perfectly deterministically even though it's not the
> desired result. Mojibake is not UB. That we've improved the descriptive
> language so we can discuss "literal encoding" hasn't changed the behavior
> of anything.
>
It was always UB, we need to state it!
- Text has meaning, if the intent of a program is to present information to
users and fails to do so, the program doesn't work as intended. We
shouldn't try to make this okay.
- Outputting things that is not text on a terminal can result in that
terminal being botched because the terminal itself as preconditions, or
security concerns when the component you are talking to has a narrow
contract and assumes you will feed it utf8
>
> `sprintf` has had this behavior forever.
>
Sure, or maybe not.
The C standard specifies those things as being character functions and
further mention character devices. This would imply an assumption about
encoding.
> Since the format is user extensible, I don't think any assumptions can be
> made about the result of print, or format, being well formed for any
> encoding that has well formed conditions. For the UTF-8 to UTF-16 case,
> print has to do _something_, so it might as well try to do something
> sensible. In most cases, though, it should be doing nothing.
>
Can you explain why the format being extensible has to do with it?
There is
- a precondition that the format string is in the literal encoding
- a post condition that the output buffer is in the encoding of the
terminal when displaying on a terminal.
Because we don't know the encoding of the format string, we can only assume
it, it is not possible to do something sensible or know whether what we do
is sensible or not.
>
> On Wed, May 26, 2021 at 11:05 PM Hubert Tong via SG16 <
> sg16_at_[hidden]> wrote:
>
>> On Wed, May 26, 2021 at 5:46 PM Corentin Jabot via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>>
>>>
>>> On Wed, May 26, 2021 at 11:01 PM Peter Brett via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I prepared these 2 polls for tonight's meeting, but we did not have the
>>>> opportunity to vote on them. I hope we will be able to vote on them next
>>>> time. Please suggest alternative or additional polls that you think would
>>>> be informative!
>>>>
>>>> Peter
>>>>
>>>>
>>>> If, and only if, the literal encoding is UTF-8, <print> facilities
>>>> should assume
>>>> that their formatted results are UTF-8 text.
>>>>
>>>> SF F N A SA
>>>>
>>>> Attendance:
>>>>
>>>> Consensus:
>>>>
>>>> Author's position:
>>>>
>>>>
>>>> If a <print> facility assumes that the result of formatting is UTF-8
>>>> text, but
>>>> it is not, then the program is ill-formed, no diagnostic required.
>>>>
>>>
>>> Please make it UB
>>>
>>
>> +1
>>
>>
>>> IFNDR is really nasty and should be reserved for things that the
>>> compiler has no real control over like ODR issues.
>>>
>>
>> IFNDR can only be applied based on the source code. IFNDR cannot be
>> applied depending on user input at run-time.
>>
>>
>>>
>>> Library has wording stating that preconditions violations are UB
>>> http://eel.is/c++draft/library#structure.specifications-3.3
>>> I think having wording along the line of "Precondition: The format
>>> string is UTF-8 encoded" in the wording of vprint_unicode is the best way
>>> to achieve this intent.
>>>
>>
>> The above won't cover strings provided by the user for replacements (and
>> that's probably just the easiest case to identify).
>>
>>
>>>
>>>
>>>>
>>>> SF F N A SA
>>>>
>>>> Attendance:
>>>>
>>>> Consensus:
>>>>
>>>> Author's position:
>>>>
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
wrote:
> IFNDR is definitely too far. I think even weak library UB is too far. The
> function behaves perfectly deterministically even though it's not the
> desired result. Mojibake is not UB. That we've improved the descriptive
> language so we can discuss "literal encoding" hasn't changed the behavior
> of anything.
>
It was always UB, we need to state it!
- Text has meaning, if the intent of a program is to present information to
users and fails to do so, the program doesn't work as intended. We
shouldn't try to make this okay.
- Outputting things that is not text on a terminal can result in that
terminal being botched because the terminal itself as preconditions, or
security concerns when the component you are talking to has a narrow
contract and assumes you will feed it utf8
>
> `sprintf` has had this behavior forever.
>
Sure, or maybe not.
The C standard specifies those things as being character functions and
further mention character devices. This would imply an assumption about
encoding.
> Since the format is user extensible, I don't think any assumptions can be
> made about the result of print, or format, being well formed for any
> encoding that has well formed conditions. For the UTF-8 to UTF-16 case,
> print has to do _something_, so it might as well try to do something
> sensible. In most cases, though, it should be doing nothing.
>
Can you explain why the format being extensible has to do with it?
There is
- a precondition that the format string is in the literal encoding
- a post condition that the output buffer is in the encoding of the
terminal when displaying on a terminal.
Because we don't know the encoding of the format string, we can only assume
it, it is not possible to do something sensible or know whether what we do
is sensible or not.
>
> On Wed, May 26, 2021 at 11:05 PM Hubert Tong via SG16 <
> sg16_at_[hidden]> wrote:
>
>> On Wed, May 26, 2021 at 5:46 PM Corentin Jabot via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>>
>>>
>>> On Wed, May 26, 2021 at 11:01 PM Peter Brett via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I prepared these 2 polls for tonight's meeting, but we did not have the
>>>> opportunity to vote on them. I hope we will be able to vote on them next
>>>> time. Please suggest alternative or additional polls that you think would
>>>> be informative!
>>>>
>>>> Peter
>>>>
>>>>
>>>> If, and only if, the literal encoding is UTF-8, <print> facilities
>>>> should assume
>>>> that their formatted results are UTF-8 text.
>>>>
>>>> SF F N A SA
>>>>
>>>> Attendance:
>>>>
>>>> Consensus:
>>>>
>>>> Author's position:
>>>>
>>>>
>>>> If a <print> facility assumes that the result of formatting is UTF-8
>>>> text, but
>>>> it is not, then the program is ill-formed, no diagnostic required.
>>>>
>>>
>>> Please make it UB
>>>
>>
>> +1
>>
>>
>>> IFNDR is really nasty and should be reserved for things that the
>>> compiler has no real control over like ODR issues.
>>>
>>
>> IFNDR can only be applied based on the source code. IFNDR cannot be
>> applied depending on user input at run-time.
>>
>>
>>>
>>> Library has wording stating that preconditions violations are UB
>>> http://eel.is/c++draft/library#structure.specifications-3.3
>>> I think having wording along the line of "Precondition: The format
>>> string is UTF-8 encoded" in the wording of vprint_unicode is the best way
>>> to achieve this intent.
>>>
>>
>> The above won't cover strings provided by the user for replacements (and
>> that's probably just the easiest case to identify).
>>
>>
>>>
>>>
>>>>
>>>> SF F N A SA
>>>>
>>>> Attendance:
>>>>
>>>> Consensus:
>>>>
>>>> Author's position:
>>>>
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2021-05-28 02:02:11