Date: Mon, 7 Nov 2022 23:51:40 -0500
Hi, Thomas.
The following NB comments have to do with difficulties we're facing in
referring to Unicode features in the C++ standard. See also a rejected
attempt to resolve these matters editorially here
<https://github.com/cplusplus/draft/pull/5826>.
* FR-010-133 <https://github.com/cplusplus/nbballot/issues/412>
[Bibliography] Unify references to Unicode
* FR-021-013 <https://github.com/cplusplus/nbballot/issues/423>
5.3p5.2 [lex.charset] Codepoint names in identifiers
The issue that we are facing is that
1. ISO/IEC 10646 specifies only a portion of the features specified in
the Unicode Standard, and
2. The C++ standard has normative dependencies on features from the
Unicode Standard that are not specified by ISO/IEC 10646, and
3. Use of an ISO/IEC 10646 standard that is not aligned with a use of a
Unicode Standard results in problems like that reported in FR-021-013.
The ISO requires normative references to refer to an ISO standard when
one is available as stated in section 10.2, Permitted referenced
documents, in part 2 of the ISO/IEC Directives
<https://www.iso.org/sites/directives/current/part2/index.xhtml#_idTextAnchor130>.
That has so far been applied such that, when a Unicode feature is
available in ISO/IEC 10646, we refer to that standard for that
functionality; when not, we refer to the Unicode Standard. This has
resulted in the situation that, as FR-010-133 states, we arguably
reference, directly or indirectly, up to four distinct versions of the
Unicode Standard.
Referring to both ISO/IEC 10646 and the Unicode Standard creates a
burden with regard to how to align those references in order to
reference a consistent set of Unicode features. The Unicode Standard is
released once per year. ISO/IEC 10646 is released every three years, but
amendments are issued to align the current release with new Unicode
Standards as they occur. Some clever wording, thanks to section 10.4,
Undated references, of the ISO/IEC Directives
<https://www.iso.org/sites/directives/current/part2/index.xhtml#_idTextAnchor134>,
would permit an undated reference to ISO/IEC 10646 to apply to the most
recent amendment of that standard and enable a similarly undated
reference to the Unicode Standard such that the references are aligned.
That would solve the problem to a certain degree, but 1) requires us to
provide wording that makes the relationship between the standards
sufficiently clear, 2) retains a burden on implementors of having to
consult both standards, and 3) puts us in the middle of any
discrepancies found between the two standards.
SG16 discussed these issues during its 2022-11-02 telecon
<https://github.com/sg16-unicode/sg16-meetings#november-2nd-2022>. A
summary pertaining to these two NB comments is present in a GitHub issue
comment for FR-010-133
<https://github.com/cplusplus/nbballot/issues/412#issuecomment-1304696400>.
SG16 took two polls, one of which demonstrated consensus for exploring
whether we could discontinue referring to ISO/IEC 10646 in favor of
referring only to the Unicode Standard. For ease of reference, here is
the poll:
* Poll 3: [FR-010-133][FR-021-013]: SG16 requests that the project
editor discuss with the ISO the option of eschewing references to
ISO/IEC 10646 in favor of the Unicode Standard both for technical
consistency and release frequency.
o Attendees: 9 (1 abstention)
o
SF
F
N
A
SA
3
3
0
1
1
The consensus was not unanimous; there is some demand for more rigorous
specification as demonstrated in ISO/IEC 10646 relative to the Unicode
Standard.
That brings us to you, Thomas. As project editor, what options do you
see? Is an argument that ISO/IEC 10646 is not (currently) suitable for
our purposes due to its limited scope sufficient to replace references
to it with references to the Unicode Standard? Is this a question that
we would have to take up with the ISO directly and, if so, can we do so?
I'll be happy to answer any further questions you might have.
Tom.
The following NB comments have to do with difficulties we're facing in
referring to Unicode features in the C++ standard. See also a rejected
attempt to resolve these matters editorially here
<https://github.com/cplusplus/draft/pull/5826>.
* FR-010-133 <https://github.com/cplusplus/nbballot/issues/412>
[Bibliography] Unify references to Unicode
* FR-021-013 <https://github.com/cplusplus/nbballot/issues/423>
5.3p5.2 [lex.charset] Codepoint names in identifiers
The issue that we are facing is that
1. ISO/IEC 10646 specifies only a portion of the features specified in
the Unicode Standard, and
2. The C++ standard has normative dependencies on features from the
Unicode Standard that are not specified by ISO/IEC 10646, and
3. Use of an ISO/IEC 10646 standard that is not aligned with a use of a
Unicode Standard results in problems like that reported in FR-021-013.
The ISO requires normative references to refer to an ISO standard when
one is available as stated in section 10.2, Permitted referenced
documents, in part 2 of the ISO/IEC Directives
<https://www.iso.org/sites/directives/current/part2/index.xhtml#_idTextAnchor130>.
That has so far been applied such that, when a Unicode feature is
available in ISO/IEC 10646, we refer to that standard for that
functionality; when not, we refer to the Unicode Standard. This has
resulted in the situation that, as FR-010-133 states, we arguably
reference, directly or indirectly, up to four distinct versions of the
Unicode Standard.
Referring to both ISO/IEC 10646 and the Unicode Standard creates a
burden with regard to how to align those references in order to
reference a consistent set of Unicode features. The Unicode Standard is
released once per year. ISO/IEC 10646 is released every three years, but
amendments are issued to align the current release with new Unicode
Standards as they occur. Some clever wording, thanks to section 10.4,
Undated references, of the ISO/IEC Directives
<https://www.iso.org/sites/directives/current/part2/index.xhtml#_idTextAnchor134>,
would permit an undated reference to ISO/IEC 10646 to apply to the most
recent amendment of that standard and enable a similarly undated
reference to the Unicode Standard such that the references are aligned.
That would solve the problem to a certain degree, but 1) requires us to
provide wording that makes the relationship between the standards
sufficiently clear, 2) retains a burden on implementors of having to
consult both standards, and 3) puts us in the middle of any
discrepancies found between the two standards.
SG16 discussed these issues during its 2022-11-02 telecon
<https://github.com/sg16-unicode/sg16-meetings#november-2nd-2022>. A
summary pertaining to these two NB comments is present in a GitHub issue
comment for FR-010-133
<https://github.com/cplusplus/nbballot/issues/412#issuecomment-1304696400>.
SG16 took two polls, one of which demonstrated consensus for exploring
whether we could discontinue referring to ISO/IEC 10646 in favor of
referring only to the Unicode Standard. For ease of reference, here is
the poll:
* Poll 3: [FR-010-133][FR-021-013]: SG16 requests that the project
editor discuss with the ISO the option of eschewing references to
ISO/IEC 10646 in favor of the Unicode Standard both for technical
consistency and release frequency.
o Attendees: 9 (1 abstention)
o
SF
F
N
A
SA
3
3
0
1
1
The consensus was not unanimous; there is some demand for more rigorous
specification as demonstrated in ISO/IEC 10646 relative to the Unicode
Standard.
That brings us to you, Thomas. As project editor, what options do you
see? Is an argument that ISO/IEC 10646 is not (currently) suitable for
our purposes due to its limited scope sufficient to replace references
to it with references to the Unicode Standard? Is this a question that
we would have to take up with the ISO directly and, if so, can we do so?
I'll be happy to answer any further questions you might have.
Tom.
Received on 2022-11-08 04:51:42