Version 15.0 of the Unicode Standard is now
available, including the core specification, annexes, and data
files. This version adds 4,489 characters, bringing the total to
149,186 characters. These additions include two new scripts, for a
total of 161 scripts, along with 20 new emoji characters, and
4,193 CJK (Chinese, Japanese, and Korean) ideographs. The new
scripts and characters in Version 15.0 add support for modern
language groups including:
- Nag
Mundari, a modern script used to write Mundari, a language
spoken in India
- A Kannada character used to write Konkani, Awadhi, and Havyaka
Kannada in India
- Kaktovik
numerals, devised by speakers of Iñupiaq in Kaktovik, Alaska for
the counting systems of the Inuit and Yupik languages
Among the popular symbol additions are 20 new emoji, including hair
pick, maracas, jellyfish, khanda, and pink heart. For the full list
of new emoji characters, see
emoji
additions for Unicode 15.0, and
Emoji
Counts. For a detailed description of support for emoji
characters by the Unicode Standard, see
UTS #51,
Unicode Emoji.
Other symbol and notational additions include:
Support for other languages and scholarly work includes:
- Kawi,
a historical script found in Southeast Asia, used to write Old
Javanese and other languages
- Three additional characters for the Arabic script to support
Quranic marks used in Turkey
- Three Khojki characters found in handwritten and printed
documents
- Ten Devanagari characters used to represent auspicious signs
found in inscriptions and manuscripts
- Six Latin letters used in Malayalam transliteration
- Sixty-three Cyrillic modifier letters used in phonetic
transcription
Important chart font updates include:
- A set of updated glyphs for Egyptian
hieroglyphs, in addition to standardized variation sequences to
support rotated glyphs found in texts
- Improved glyphs for Unified
Canadian Aboriginal Syllabics, which provide better
support for Carrier and other languages
- A new
Wancho font, with improved and simplified shapes
Updates to the CJK blocks add:
- 4,192 ideographs in the new CJK Unified Ideographs Extension H
block
- One ideograph in the CJK Unified Ideographs Extension C block
Unicode properties and specifications determine the behavior of text
on computers and phones. The following six Unicode Standard Annexes
and Technical Standards have noteworthy updates for Version 15.0:
- UAX
#9, Unicode Bidirectional Algorithm, amends the note in
UAX9-C2 to emphasize the use of higher-level protocols to
mitigate potential source code spoofing attacks.
- UAX
#31, Unicode Identifier and Pattern Syntax, provides more
guidance on profiles for default identifiers, clarifies the use
of default ignorable code points in identifiers, and discusses
the relationship between Pattern_White_Space and bidirectional
ordering issues in programming languages.
- UAX
#38, Unicode Han Database, adds the kAlternateTotalStrokes
property. The kCihaiT property’s category was changed to
Dictionary Indices, the kKangXi property was expanded, and
Sections 3.0, 3.10, and 4.5 were added.
- UTS
#39, Unicode Security Mechanisms, changes the zero width
joiner (ZWJ) and zero width non-joiner (ZWNJ) characters from
Identifier_Status=Allowed to Identifier_Status=Restricted; they
are therefore no longer allowed by the General Security Profile
by default.
- UAX
#45, U-Source Ideographs, has records for new ideographs
in its data file, “ExtH” was added as a new status, the status
identifiers for the existing CJK Unified Ideographs blocks were
improved, and Section 2.5 was added.
- UTS
#46, Unicode IDNA Compatibility Processing, clarified the
edge case of the empty label in ToASCII and added documentation
regarding the new IDNA derived property data files.
About the Unicode Standard
The Unicode Standard provides the basis for processing, storage and
seamless data interchange of text data in any language in all modern
software and information technology protocols. It provides a
uniform, universal architecture and encoding for all languages of
the world, with over 140,000 characters currently encoded.
Unicode is required by modern standards such as XML, Java, C#,
ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the
official way to implement ISO/IEC 10646. It is a fundamental
component of all modern software.
For additional information on the Unicode Standard, please visit
https://home.unicode.org/.
About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to
develop, extend and promote use of the Unicode Standard and related
globalization standards.
The membership of the consortium represents a broad spectrum of
corporations and organizations, many in the computer and information
processing industry. Members include: Adobe, Amazon, Apple,
Emojipedia, Google, Government of Bangladesh, International Emerging
Technology Company (ETCO), Meta, Microsoft, Netflix, Salesforce,
SAP, Tamil Virtual Academy, The University of California (Berkeley),
Yat Labs, plus well over a hundred Associate, Liaison, and
Individual members. For a complete member list go to
https://home.unicode.org/membership/members/.
For more information, please contact the Unicode Consortium
https://home.unicode.org/connect/contact-unicode/.
Over 144,000 characters are available for adoption
to help the Unicode Consortium’s work on digitally disadvantaged
languages
![[badge]]()