A summary and notes from the event follow. Please note that I did not attempt to capture everything.
The recorded event will be made available at https://www.youtube.com/channel/UCQNrSepJnz8BjWT7lrKH9Tw.
I will send an update when/if I become aware that the recorded
event is available.
Several themes were echoed across the presentations.
The first is that the Unicode Consortium (UC) manages three main
projects. The Unicode
Standard provides a specification of characters, scripts,
encodings, character properties, algorithms, and more. The CLDR provides a
specification of languages, locales, regions, and cultural
conventions. ICU, and now
ICU4X, provide portable
libraries that enable applications to provide support for
internationalization, localization, and much more. Each project
builds on top of the previous ones.
The second theme is that the UC invites involvement. Each of the
project presentations explained how to get involved and what
opportunities are available. It was noted that opportunities are
not limited to those with deep language experience! There are
opportunities for translators, linguists, researchers, PMs,
technical writers, UI designers, and of course, programmers!
Addison's presentation, as its title suggests, provided an introduction to topics of internationalization. Included was discussion of differences between written languages, graphs of language use, examples of collation and other cultural differences, and a definition of internationalization and localization. Message formatting was also discussed. The presentation ended with a set of commitments programmers are encouraged to adhere to when writing software. Those are:
Mark provided an overview of the three main UC projects listed
above, the organization of the UC, a timeline of significant
events that contributed to the development of Unicode and status
of the UC, and on-going work we can expect to see more of in the
future.
Some of the mentioned timeline events included:
The UC is organized around the UC projects; the following committees and sub-committees were discussed.
Future work we can expect to see coming out of the UC includes:
Deborah explained the role of the UTC subcommittees; to study and review proposals and to make recommendations to the UTC. Much of the presentation focused on the work of the Script Ad Hoc committee:
Deborah highlighted the importance of work to adopt and improve
script support:
Mark and Annemarie explained that the CLDR is provided by all major operating systems.
Example capabilities that the CLDR provides were demonstrated. These included:
A language survey tool is used to help build the CLDR data. The
tool enables language researchers to produce a consensus driven
specification of cultural expectations for a given locale. Note
that multiple locales may use the same language, but have
different expectations for how the language is written. Examples
of what the tool can be used to specify include:
The CLDR itself consists of data in a structured format, specifications for how to use that data, and release overviews.
Language and locale support is characterized by how fully the
CLDR supports it. There are four categories:
The CLDR exists to protect investment in written languages, prioritize language support improvements, provide interoperability, and acknowledge digitally disadvantaged languages.
There are opportunities to contribute to the CLDR for
translators, linguists, language researchers, project managers,
tech writers, UI designers, and programmers.
Markus provided a demonstration of locale dependent collation using the ICU online tool available at https://icu4c-demos.unicode.org/icu-bin/collation.html. The demonstration showed how the order in which a list of names is presented changes based on locale selection.
Major benefits of ICU include:
ICU originated in the 1990s. As such, like all long-lived products, it has acquired technical debt and its interfaces reflect the design principles customary at the time.
ICU4x is intended to provide more modern interfaces. ICU4X is not
intended to replace ICU.
Shane explained that there is a need to provide i18n support for more programming languages, for smaller devices, and for client side frameworks where ICU is not always a good fit.
ICU4X is written in Rust and designed to be lightweight, portable, and secure. Benefits of these goals were described as:
Some key decisions contributed to the ICU4X effort:
Version 1.0 was just released.
An online demo that illustrates fixed decimal formatting, date and time formatting, and word segmentation was shown.
Mark Davis participated in a Q & A session following the
presentations. At one point, he was asked what he is most proud
of. He answered, "the idea of Unicode" and explained that, before
Unicode, the proliferation of code pages produced a disaster with
effects that can still be seen to this day. A truth that we in
SG16 know all too well!
Tom.
The Unicode Consortium will be hosting an ~2 hour free online event on Wednesday, September 28th, 2022 at 16:30 UTC (timezone conversion).
The topics and speakers include:
- An Introduction to Internationalization (i18n) - Addison Phillips, Internationalization Engineer
- Overview of the Unicode Consortium: History and Future - Mark Davis, Cofounder and President
- Scripts and Character Encoding - Deborah Anderson, Chair of the Script Ad Hoc Committee
- The Common Locale Data Repository (CLDR) - Mark Davis and Annemarie Apple, Chair and Vice Chair of the CLDR Committee
- International Components for Unicode (ICU) - Markus Scherer, Chair of ICU Committee
- Bringing Internationalization to More Programming Languages and Resource-Constrained Environments (ICU4X) - Shane Carr, Chair of ICU4X Subcommittee
Additional details and registration information are available here.
Tom.