Date: Wed, 14 Sep 2022 03:44:56 +0200
After more looking, we managed to have no less than 4 distinct unicode
versions referenced.
Good news:
The normative reference is floating and correct
http://eel.is/c++draft/intro.refs - so XID_ properties point to 15.
Less good news, ISO/IEC 10646 currently points to 13.0, so technically the
names for \N{} do not cover 14/15. Fortunately, implementers derive the
names from UnicodeData.txt
The non normative reference (bibliography) needs fixing editorially(?).
I made a PR https://github.com/cplusplus/draft/pull/5826
On Wed, Sep 14, 2022 at 2:39 AM Steve Downey <sdowney_at_[hidden]> wrote:
> Would it be worth having a note making it clear that compiler uses only
> stable parts of the spec, and it's safe to upgrade all of them
> consistently?
>
>
>
> On Tue, Sep 13, 2022, 20:30 Corentin Jabot <corentinjabot_at_[hidden]>
> wrote:
>
>>
>>
>> On Wed, Sep 14, 2022, 02:19 Steve Downey via SG16 <sg16_at_[hidden]>
>> wrote:
>>
>>> What implementation headaches would it cause to bump our Unicode
>>> reference? TR31 identifier characters and named escapes might be impacted?
>>> Neither of which is hard, but are still some work.
>>>
>>
>> Expect clang to support Unicode 15 soon, i did the pr earlier today.
>> In general because Unicode releases faster than C++, vendors should
>> update without waiting for the standard to catch up, in order to support
>> new codepoints.
>>
>> How many references to Unicode do we have?.
>> Most of the wording sadly relies on iso 10646 which doesn't get updated
>> as fast.
>>
>>
>>> ---------- Forwarded message ---------
>>> From: announcements via announcements <announcements_at_[hidden]>
>>> Date: Tue, Sep 13, 2022, 17:40
>>> Subject: Announcing The Unicode® Standard, Version 15.0
>>> To: <announcements_at_[hidden]>
>>> Cc: announcements <announcements_at_[hidden]>
>>>
>>>
>>> [image: [Nag Mundari image]]Version 15.0 of the Unicode Standard is now
>>> available, including the core specification, annexes, and data files. This
>>> version adds 4,489 characters, bringing the total to 149,186 characters.
>>> These additions include two new scripts, for a total of 161 scripts, along
>>> with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean)
>>> ideographs. The new scripts and characters in Version 15.0 add support for
>>> modern language groups including:
>>>
>>> - Nag Mundari
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-1E4D0.pdf>, a
>>> modern script used to write Mundari, a language spoken in India
>>> - A Kannada character used to write Konkani, Awadhi, and Havyaka
>>> Kannada in India
>>> - Kaktovik
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-1D2C0.pdf>
>>> numerals, devised by speakers of Iñupiaq in Kaktovik, Alaska for the
>>> counting systems of the Inuit and Yupik languages
>>>
>>> Among the popular symbol additions are 20 new emoji, including hair
>>> pick, maracas, jellyfish, khanda, and pink heart. For the full list of new
>>> emoji characters, see emoji additions for Unicode 15.0
>>> <https://unicode.org/emoji/charts-15.0/emoji-released.html>, and Emoji
>>> Counts <https://www.unicode.org/emoji/charts-15.0/emoji-counts.html>.
>>> For a detailed description of support for emoji characters by the Unicode
>>> Standard, see UTS #51, Unicode Emoji
>>> <https://www.unicode.org/reports/tr51/tr51-23.html>.
>>>
>>> [image: [Image credit Noto Emoji]]
>>> <https://www.unicode.org/announcements/u15-emoji-annc-large.png>
>>>
>>> Other symbol and notational additions include:
>>>
>>> - The nine pointed white star, used by members of the Bahá’í faith
>>> - Eight symbols for celestial bodies
>>> <http://blog.unicode.org/2022/05/out-of-this-world-new-astronomy-symbols.html>,
>>> used by astronomers and astrologers
>>> - Twenty-nine additional Egyptian hieroglyph format controls
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-13430.pdf>,
>>> which will enable Egyptologists to better represent texts
>>>
>>> Support for other languages and scholarly work includes:
>>>
>>> - Kawi
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-11F00.pdf>, a
>>> historical script found in Southeast Asia, used to write Old Javanese and
>>> other languages
>>> - Three additional characters for the Arabic script to support
>>> Quranic marks used in Turkey
>>> - Three Khojki characters found in handwritten and printed documents
>>> - Ten Devanagari characters used to represent auspicious signs found
>>> in inscriptions and manuscripts
>>> - Six Latin letters used in Malayalam transliteration
>>> - Sixty-three Cyrillic modifier letters used in phonetic
>>> transcription
>>>
>>> Important chart font updates include:
>>>
>>> - A set of updated glyphs for Egyptian
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-13000.pdf>
>>> hieroglyphs, in addition to standardized variation sequences to support
>>> rotated glyphs found in texts
>>> - Improved glyphs for Unified Canadian Aboriginal Syllabics
>>> <https://blog.unicode.org/2022/06/working-with-local-communities-to.html>,
>>> which provide better support for Carrier and other languages
>>> - A new Wancho
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-1E2C0.pdf>
>>> font, with improved and simplified shapes
>>>
>>> Updates to the CJK blocks add:
>>>
>>> - 4,192 ideographs in the new CJK Unified Ideographs Extension H
>>> block
>>> - One ideograph in the CJK Unified Ideographs Extension C block
>>>
>>> Unicode properties and specifications determine the behavior of text on
>>> computers and phones. The following six Unicode Standard Annexes and
>>> Technical Standards have noteworthy updates for Version 15.0:
>>>
>>> - UAX #9 <https://www.unicode.org/reports/tr9/tr9-46.html>, Unicode
>>> Bidirectional Algorithm, amends the note in UAX9-C2 to emphasize the use of
>>> higher-level protocols to mitigate potential source code spoofing attacks.
>>> - UAX #31 <https://www.unicode.org/reports/tr31/tr31-37.html>,
>>> Unicode Identifier and Pattern Syntax, provides more guidance on profiles
>>> for default identifiers, clarifies the use of default ignorable code points
>>> in identifiers, and discusses the relationship between Pattern_White_Space
>>> and bidirectional ordering issues in programming languages.
>>> - UAX #38 <https://www.unicode.org/reports/tr38/tr38-33.html>,
>>> Unicode Han Database, adds the kAlternateTotalStrokes property. The kCihaiT
>>> property’s category was changed to Dictionary Indices, the kKangXi property
>>> was expanded, and Sections 3.0, 3.10, and 4.5 were added.
>>> - UTS #39 <https://www.unicode.org/reports/tr39/tr39-26.html>,
>>> Unicode Security Mechanisms, changes the zero width joiner (ZWJ) and zero
>>> width non-joiner (ZWNJ) characters from Identifier_Status=Allowed to
>>> Identifier_Status=Restricted; they are therefore no longer allowed by the
>>> General Security Profile by default.
>>> - UAX #45 <https://www.unicode.org/reports/tr45/tr45-27.html>,
>>> U-Source Ideographs, has records for new ideographs in its data file,
>>> “ExtH” was added as a new status, the status identifiers for the existing
>>> CJK Unified Ideographs blocks were improved, and Section 2.5 was added.
>>> - UTS #46 <https://www.unicode.org/reports/tr46/tr46-29.html>,
>>> Unicode IDNA Compatibility Processing, clarified the edge case of the empty
>>> label in ToASCII and added documentation regarding the new IDNA derived
>>> property data files.
>>>
>>> About the Unicode Standard The Unicode Standard provides the basis for
>>> processing, storage and seamless data interchange of text data in any
>>> language in all modern software and information technology protocols. It
>>> provides a uniform, universal architecture and encoding for all languages
>>> of the world, with over 140,000 characters currently encoded.
>>>
>>> Unicode is required by modern standards such as XML, Java, C#,
>>> ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official
>>> way to implement ISO/IEC 10646. It is a fundamental component of all modern
>>> software.
>>>
>>> For additional information on the Unicode Standard, please visit
>>> https://home.unicode.org/.
>>> About the Unicode Consortium The Unicode Consortium is a non-profit
>>> organization founded to develop, extend and promote use of the Unicode
>>> Standard and related globalization standards.
>>> The membership of the consortium represents a broad spectrum of
>>> corporations and organizations, many in the computer and information
>>> processing industry. Members include: Adobe, Amazon, Apple, Emojipedia,
>>> Google, Government of Bangladesh, International Emerging Technology Company
>>> (ETCO), Meta, Microsoft, Netflix, Salesforce, SAP, Tamil Virtual Academy,
>>> The University of California (Berkeley), Yat Labs, plus well over a hundred
>>> Associate, Liaison, and Individual members. For a complete member list go
>>> to https://home.unicode.org/membership/members/.
>>> For more information, please contact the Unicode Consortium
>>> https://home.unicode.org/connect/contact-unicode/.
>>>
>>> ------------------------------
>>> *Over 144,000 characters are available for adoption
>>> <https://www.unicode.org/consortium/adopt-a-character.html> to help the
>>> Unicode Consortium’s work on digitally disadvantaged languages*
>>>
>>> [image: [badge]]
>>> <https://www.unicode.org/consortium/adopt-a-character.html>
>>>
>>>
>>> https://blog.unicode.org/2022/09/announcing-unicode-standard-version-150.html
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>>
versions referenced.
Good news:
The normative reference is floating and correct
http://eel.is/c++draft/intro.refs - so XID_ properties point to 15.
Less good news, ISO/IEC 10646 currently points to 13.0, so technically the
names for \N{} do not cover 14/15. Fortunately, implementers derive the
names from UnicodeData.txt
The non normative reference (bibliography) needs fixing editorially(?).
I made a PR https://github.com/cplusplus/draft/pull/5826
On Wed, Sep 14, 2022 at 2:39 AM Steve Downey <sdowney_at_[hidden]> wrote:
> Would it be worth having a note making it clear that compiler uses only
> stable parts of the spec, and it's safe to upgrade all of them
> consistently?
>
>
>
> On Tue, Sep 13, 2022, 20:30 Corentin Jabot <corentinjabot_at_[hidden]>
> wrote:
>
>>
>>
>> On Wed, Sep 14, 2022, 02:19 Steve Downey via SG16 <sg16_at_[hidden]>
>> wrote:
>>
>>> What implementation headaches would it cause to bump our Unicode
>>> reference? TR31 identifier characters and named escapes might be impacted?
>>> Neither of which is hard, but are still some work.
>>>
>>
>> Expect clang to support Unicode 15 soon, i did the pr earlier today.
>> In general because Unicode releases faster than C++, vendors should
>> update without waiting for the standard to catch up, in order to support
>> new codepoints.
>>
>> How many references to Unicode do we have?.
>> Most of the wording sadly relies on iso 10646 which doesn't get updated
>> as fast.
>>
>>
>>> ---------- Forwarded message ---------
>>> From: announcements via announcements <announcements_at_[hidden]>
>>> Date: Tue, Sep 13, 2022, 17:40
>>> Subject: Announcing The Unicode® Standard, Version 15.0
>>> To: <announcements_at_[hidden]>
>>> Cc: announcements <announcements_at_[hidden]>
>>>
>>>
>>> [image: [Nag Mundari image]]Version 15.0 of the Unicode Standard is now
>>> available, including the core specification, annexes, and data files. This
>>> version adds 4,489 characters, bringing the total to 149,186 characters.
>>> These additions include two new scripts, for a total of 161 scripts, along
>>> with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean)
>>> ideographs. The new scripts and characters in Version 15.0 add support for
>>> modern language groups including:
>>>
>>> - Nag Mundari
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-1E4D0.pdf>, a
>>> modern script used to write Mundari, a language spoken in India
>>> - A Kannada character used to write Konkani, Awadhi, and Havyaka
>>> Kannada in India
>>> - Kaktovik
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-1D2C0.pdf>
>>> numerals, devised by speakers of Iñupiaq in Kaktovik, Alaska for the
>>> counting systems of the Inuit and Yupik languages
>>>
>>> Among the popular symbol additions are 20 new emoji, including hair
>>> pick, maracas, jellyfish, khanda, and pink heart. For the full list of new
>>> emoji characters, see emoji additions for Unicode 15.0
>>> <https://unicode.org/emoji/charts-15.0/emoji-released.html>, and Emoji
>>> Counts <https://www.unicode.org/emoji/charts-15.0/emoji-counts.html>.
>>> For a detailed description of support for emoji characters by the Unicode
>>> Standard, see UTS #51, Unicode Emoji
>>> <https://www.unicode.org/reports/tr51/tr51-23.html>.
>>>
>>> [image: [Image credit Noto Emoji]]
>>> <https://www.unicode.org/announcements/u15-emoji-annc-large.png>
>>>
>>> Other symbol and notational additions include:
>>>
>>> - The nine pointed white star, used by members of the Bahá’í faith
>>> - Eight symbols for celestial bodies
>>> <http://blog.unicode.org/2022/05/out-of-this-world-new-astronomy-symbols.html>,
>>> used by astronomers and astrologers
>>> - Twenty-nine additional Egyptian hieroglyph format controls
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-13430.pdf>,
>>> which will enable Egyptologists to better represent texts
>>>
>>> Support for other languages and scholarly work includes:
>>>
>>> - Kawi
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-11F00.pdf>, a
>>> historical script found in Southeast Asia, used to write Old Javanese and
>>> other languages
>>> - Three additional characters for the Arabic script to support
>>> Quranic marks used in Turkey
>>> - Three Khojki characters found in handwritten and printed documents
>>> - Ten Devanagari characters used to represent auspicious signs found
>>> in inscriptions and manuscripts
>>> - Six Latin letters used in Malayalam transliteration
>>> - Sixty-three Cyrillic modifier letters used in phonetic
>>> transcription
>>>
>>> Important chart font updates include:
>>>
>>> - A set of updated glyphs for Egyptian
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-13000.pdf>
>>> hieroglyphs, in addition to standardized variation sequences to support
>>> rotated glyphs found in texts
>>> - Improved glyphs for Unified Canadian Aboriginal Syllabics
>>> <https://blog.unicode.org/2022/06/working-with-local-communities-to.html>,
>>> which provide better support for Carrier and other languages
>>> - A new Wancho
>>> <https://www.unicode.org/charts/PDF/Unicode-15.0/U150-1E2C0.pdf>
>>> font, with improved and simplified shapes
>>>
>>> Updates to the CJK blocks add:
>>>
>>> - 4,192 ideographs in the new CJK Unified Ideographs Extension H
>>> block
>>> - One ideograph in the CJK Unified Ideographs Extension C block
>>>
>>> Unicode properties and specifications determine the behavior of text on
>>> computers and phones. The following six Unicode Standard Annexes and
>>> Technical Standards have noteworthy updates for Version 15.0:
>>>
>>> - UAX #9 <https://www.unicode.org/reports/tr9/tr9-46.html>, Unicode
>>> Bidirectional Algorithm, amends the note in UAX9-C2 to emphasize the use of
>>> higher-level protocols to mitigate potential source code spoofing attacks.
>>> - UAX #31 <https://www.unicode.org/reports/tr31/tr31-37.html>,
>>> Unicode Identifier and Pattern Syntax, provides more guidance on profiles
>>> for default identifiers, clarifies the use of default ignorable code points
>>> in identifiers, and discusses the relationship between Pattern_White_Space
>>> and bidirectional ordering issues in programming languages.
>>> - UAX #38 <https://www.unicode.org/reports/tr38/tr38-33.html>,
>>> Unicode Han Database, adds the kAlternateTotalStrokes property. The kCihaiT
>>> property’s category was changed to Dictionary Indices, the kKangXi property
>>> was expanded, and Sections 3.0, 3.10, and 4.5 were added.
>>> - UTS #39 <https://www.unicode.org/reports/tr39/tr39-26.html>,
>>> Unicode Security Mechanisms, changes the zero width joiner (ZWJ) and zero
>>> width non-joiner (ZWNJ) characters from Identifier_Status=Allowed to
>>> Identifier_Status=Restricted; they are therefore no longer allowed by the
>>> General Security Profile by default.
>>> - UAX #45 <https://www.unicode.org/reports/tr45/tr45-27.html>,
>>> U-Source Ideographs, has records for new ideographs in its data file,
>>> “ExtH” was added as a new status, the status identifiers for the existing
>>> CJK Unified Ideographs blocks were improved, and Section 2.5 was added.
>>> - UTS #46 <https://www.unicode.org/reports/tr46/tr46-29.html>,
>>> Unicode IDNA Compatibility Processing, clarified the edge case of the empty
>>> label in ToASCII and added documentation regarding the new IDNA derived
>>> property data files.
>>>
>>> About the Unicode Standard The Unicode Standard provides the basis for
>>> processing, storage and seamless data interchange of text data in any
>>> language in all modern software and information technology protocols. It
>>> provides a uniform, universal architecture and encoding for all languages
>>> of the world, with over 140,000 characters currently encoded.
>>>
>>> Unicode is required by modern standards such as XML, Java, C#,
>>> ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official
>>> way to implement ISO/IEC 10646. It is a fundamental component of all modern
>>> software.
>>>
>>> For additional information on the Unicode Standard, please visit
>>> https://home.unicode.org/.
>>> About the Unicode Consortium The Unicode Consortium is a non-profit
>>> organization founded to develop, extend and promote use of the Unicode
>>> Standard and related globalization standards.
>>> The membership of the consortium represents a broad spectrum of
>>> corporations and organizations, many in the computer and information
>>> processing industry. Members include: Adobe, Amazon, Apple, Emojipedia,
>>> Google, Government of Bangladesh, International Emerging Technology Company
>>> (ETCO), Meta, Microsoft, Netflix, Salesforce, SAP, Tamil Virtual Academy,
>>> The University of California (Berkeley), Yat Labs, plus well over a hundred
>>> Associate, Liaison, and Individual members. For a complete member list go
>>> to https://home.unicode.org/membership/members/.
>>> For more information, please contact the Unicode Consortium
>>> https://home.unicode.org/connect/contact-unicode/.
>>>
>>> ------------------------------
>>> *Over 144,000 characters are available for adoption
>>> <https://www.unicode.org/consortium/adopt-a-character.html> to help the
>>> Unicode Consortium’s work on digitally disadvantaged languages*
>>>
>>> [image: [badge]]
>>> <https://www.unicode.org/consortium/adopt-a-character.html>
>>>
>>>
>>> https://blog.unicode.org/2022/09/announcing-unicode-standard-version-150.html
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>>
Received on 2022-09-14 01:45:10