C++ Logo

sg16

Advanced search

Re: [isocpp-core] Seeking ISO guidance regarding referencing the Unicode Standard in lieu of ISO/IEC 10646.

From: Corentin <corentin.jabot_at_[hidden]>
Date: Tue, 8 Nov 2022 09:04:56 -1000
Thanks Thomas for the quick response.
This is excellent news.
We probably want SG16 to confirm the direction as there were some dissident
voices.
If we decide to move forward, I'm happy to review the terminology/write a
paper for that (along with replacing the term translation set if SG16) gets
consensus on that.

UCS Scalar value => Unicode scalar value
character (where we do mean character) => abstract character
code point/code unit => unchanged.

If we think there are definitions in Unicode that lack precision, I think
we can talk to Unicode folks about improving them.
In the past we have had to ask iso 10646 to align some definitions with
Unicode as important details were missing,
so lack of clarity wouldn't become a new problem.










On Tue, Nov 8, 2022 at 7:10 AM Thomas Köppe via Core <core_at_[hidden]>
wrote:

> Hi Tom,
>
> Corentin and Jens had already informed me of this issue, and I talked to a
> few people, and we briefly discussed this in Core yesterday.
>
> On Tue, 8 Nov 2022 at 04:51, Tom Honermann <tom_at_[hidden]> wrote:
>
>> The following NB comments have to do with difficulties we're facing in
>> referring to Unicode features in the C++ standard. See also a rejected
>> attempt to resolve these matters editorially here
>> <https://github.com/cplusplus/draft/pull/5826>.
>>
>> - FR-010-133 <https://github.com/cplusplus/nbballot/issues/412>
>> [Bibliography] Unify references to Unicode
>> - FR-021-013 <https://github.com/cplusplus/nbballot/issues/423>
>> 5.3p5.2 [lex.charset] Codepoint names in identifiers
>>
>> The issue that we are facing is that
>>
>> 1. ISO/IEC 10646 specifies only a portion of the features specified
>> in the Unicode Standard, and
>> 2. The C++ standard has normative dependencies on features from the
>> Unicode Standard that are not specified by ISO/IEC 10646, and
>> 3. Use of an ISO/IEC 10646 standard that is not aligned with a use of
>> a Unicode Standard results in problems like that reported in FR-021-013.
>>
>> The ISO requires normative references to refer to an ISO standard when
>> one is available as stated in section 10.2, Permitted referenced
>> documents, in part 2 of the ISO/IEC Directives
>> <https://www.iso.org/sites/directives/current/part2/index.xhtml#_idTextAnchor130>.
>> That has so far been applied such that, when a Unicode feature is available
>> in ISO/IEC 10646, we refer to that standard for that functionality; when
>> not, we refer to the Unicode Standard. This has resulted in the situation
>> that, as FR-010-133 states, we arguably reference, directly or indirectly,
>> up to four distinct versions of the Unicode Standard.
>>
>> Referring to both ISO/IEC 10646 and the Unicode Standard creates a burden
>> with regard to how to align those references in order to reference a
>> consistent set of Unicode features. The Unicode Standard is released once
>> per year. ISO/IEC 10646 is released every three years, but amendments are
>> issued to align the current release with new Unicode Standards as they
>> occur. Some clever wording, thanks to section 10.4, Undated references,
>> of the ISO/IEC Directives
>> <https://www.iso.org/sites/directives/current/part2/index.xhtml#_idTextAnchor134>,
>> would permit an undated reference to ISO/IEC 10646 to apply to the most
>> recent amendment of that standard and enable a similarly undated reference
>> to the Unicode Standard such that the references are aligned. That would
>> solve the problem to a certain degree, but 1) requires us to provide
>> wording that makes the relationship between the standards sufficiently
>> clear, 2) retains a burden on implementors of having to consult both
>> standards, and 3) puts us in the middle of any discrepancies found between
>> the two standards.
>>
>> SG16 discussed these issues during its 2022-11-02 telecon
>> <https://github.com/sg16-unicode/sg16-meetings#november-2nd-2022>. A
>> summary pertaining to these two NB comments is present in a GitHub issue
>> comment for FR-010-133
>> <https://github.com/cplusplus/nbballot/issues/412#issuecomment-1304696400>.
>> SG16 took two polls, one of which demonstrated consensus for exploring
>> whether we could discontinue referring to ISO/IEC 10646 in favor of
>> referring only to the Unicode Standard. For ease of reference, here is the
>> poll:
>>
>> - Poll 3: [FR-010-133][FR-021-013]: SG16 requests that the project
>> editor discuss with the ISO the option of eschewing references to ISO/IEC
>> 10646 in favor of the Unicode Standard both for technical consistency and
>> release frequency.
>> - Attendees: 9 (1 abstention)
>> -
>> SF
>> F
>> N
>> A
>> SA
>> 3
>> 3
>> 0
>> 1
>> 1
>>
>> The consensus was not unanimous; there is some demand for more rigorous
>> specification as demonstrated in ISO/IEC 10646 relative to the Unicode
>> Standard.
>>
>> That brings us to you, Thomas. As project editor, what options do you
>> see? Is an argument that ISO/IEC 10646 is not (currently) suitable for our
>> purposes due to its limited scope sufficient to replace references to it
>> with references to the Unicode Standard? Is this a question that we would
>> have to take up with the ISO directly and, if so, can we do so?
>>
> I think a reasonable position for us is to first and foremost do what is
> right for the Standard: if we need the references to the Unicode Standard
> for completeness and correctness, we will cite that reference, and if for
> consistency it means that we should also use the Unicode Standard for
> things that would also be available from ISO 10646, but it is easier to
> specify and for implementers to follow if we take everything from the same
> reference, then I think we have a good and defensible case for "the absence
> of appropriate ISO or IEC documents": ISO 10646 is simply not appropriate
> for our purposes. Given that we are already citing the Unicode Standard
> normatively, I think we can be confident that all the other conditions in
> 10.2 are already met, and it sounds like that a possible outcome would be
> that we simply _remove_ the reference to ISO 10646, which would also be an
> improvement for our users (who would need to obtain fewer documents).
>
> At this point I'd be happy for us to simply proceed in this direction and
> assume it is acceptable. Worst case ISO could send us a DIS comment about
> this, and we could revisit the decision at that point. If you prefer, I can
> contact the ISO secretariat proactively and ask about this, but I don't
> think that is necessary at this point and would only do it if you
> explicitly wanted me to.
>
> Please note that in the Core discussion yesterday we noticed some
> consequences on which I have no opinion, but which you should probably take
> into careful consideration:
>
> - Allegedly the Unicode Standard does not define terms in the same way
> and at the same level of precision as ISO 10646 does, so if by changing
> references we would lose some external definitions that we are currently
> relying on, we would need to find a suitable alternative. (I don't know
> anything about that detail regarding either the Unicode Standard or ISO
> 10646.)
> - We thought that the term "UCS scalar value" is something we
> currently use from the undated ISO 10646 reference, and allegedly that term
> does not exist in the Unicode Standard.
>
> I'm not familiar with the subject matter, so I would appreciate that if
> you do make changes in the proposed direction, then please check all this
> carefully. Do let me know if you would also like me to check at the end,
> but it'd be great if SG16 could make sure that all terms are properly
> defined once we lose the undated reference to ISO 10646.
>
> Let me know if you'd like to discuss anything else!
>
> Thomas
> _______________________________________________
> Core mailing list
> Core_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2022/11/13453.php
>

Received on 2022-11-08 19:05:10