A summary and notes from the event follow. Please note that I did not attempt to capture everything.

The recorded event will be made available at https://www.youtube.com/channel/UCQNrSepJnz8BjWT7lrKH9Tw. I will send an update when/if I become aware that the recorded event is available.

Several themes were echoed across the presentations.

The first is that the Unicode Consortium (UC) manages three main projects. The Unicode Standard provides a specification of characters, scripts, encodings, character properties, algorithms, and more. The CLDR provides a specification of languages, locales, regions, and cultural conventions. ICU, and now ICU4X, provide portable libraries that enable applications to provide support for internationalization, localization, and much more. Each project builds on top of the previous ones.

The second theme is that the UC invites involvement. Each of the project presentations explained how to get involved and what opportunities are available. It was noted that opportunities are not limited to those with deep language experience! There are opportunities for translators, linguists, researchers, PMs, technical writers, UI designers, and of course, programmers!

An Introduction to Internationalization (i18n) - Addison Phillips, Internationalization Engineer

Addison's presentation, as its title suggests, provided an introduction to topics of internationalization. Included was discussion of differences between written languages, graphs of language use, examples of collation and other cultural differences, and a definition of internationalization and localization. Message formatting was also discussed. The presentation ended with a set of commitments programmers are encouraged to adhere to when writing software. Those are:

  1. Use i18n best practices.
  2. Use Unicode.
  3. Use locales.
  4. Use resources (e.g., language resource bundles).
  5. Use message formatting APIs.

Overview of the Unicode Consortium: History and Future - Mark Davis, Cofounder and President

Mark provided an overview of the three main UC projects listed above, the organization of the UC, a timeline of significant events that contributed to the development of Unicode and status of the UC, and on-going work we can expect to see more of in the future.

Some of the mentioned timeline events included:

  1. The invention of writing systems ~3400 BC.
  2. The standardization of ASCII in 1963.
  3. The standardization of Unicode in 1991.
  4. The introduction of Unicode character properties in 1995.
  5. The CLDR in 2003.
  6. The adopt a character program in 2015 and funding for digitally disadvantaged languages.
  7. The inclusion of ICU as a UC project in 2016.

The UC is organized around the UC projects; the following committees and sub-committees were discussed.

Future work we can expect to see coming out of the UC includes:

Scripts and Character Encoding - Deborah Anderson, Chair of the Script Ad Hoc Committee

Deborah explained the role of the UTC subcommittees; to study and review proposals and to make recommendations to the UTC. Much of the presentation focused on the work of the Script Ad Hoc committee:

Deborah highlighted the importance of work to adopt and improve script support:

The Common Locale Data Repository (CLDR) - Mark Davis and Annemarie Apple, Chair and Vice Chair of the CLDR Committee

Mark and Annemarie explained that the CLDR is provided by all major operating systems.

Example capabilities that the CLDR provides were demonstrated. These included:

A language survey tool is used to help build the CLDR data. The tool enables language researchers to produce a consensus driven specification of cultural expectations for a given locale. Note that multiple locales may use the same language, but have different expectations for how the language is written. Examples of what the tool can be used to specify include:

The CLDR itself consists of data in a structured format, specifications for how to use that data, and release overviews.

Language and locale support is characterized by how fully the CLDR supports it. There are four categories:

The CLDR exists to protect investment in written languages, prioritize language support improvements, provide interoperability, and acknowledge digitally disadvantaged languages.

There are opportunities to contribute to the CLDR for translators, linguists, language researchers, project managers, tech writers, UI designers, and programmers.

International Components for Unicode (ICU) - Markus Scherer, Chair of ICU Committee

Markus provided a demonstration of locale dependent collation using the ICU online tool available at https://icu4c-demos.unicode.org/icu-bin/collation.html. The demonstration showed how the order in which a list of names is presented changes based on locale selection.

Major benefits of ICU include:

ICU originated in the 1990s. As such, like all long-lived products, it has acquired technical debt and its interfaces reflect the design principles customary at the time.

ICU4x is intended to provide more modern interfaces. ICU4X is not intended to replace ICU.

Bringing Internationalization to More Programming Languages and Resource-Constrained Environments (ICU4X) - Shane Carr, Chair of ICU4X Subcommittee

Shane explained that there is a need to provide i18n support for more programming languages, for smaller devices, and for client side frameworks where ICU is not always a good fit.

ICU4X is written in Rust and designed to be lightweight, portable, and secure. Benefits of these goals were described as:

Some key decisions contributed to the ICU4X effort:

Version 1.0 was just released.

An online demo that illustrates fixed decimal formatting, date and time formatting, and word segmentation was shown.

Q & A

Mark Davis participated in a Q & A session following the presentations. At one point, he was asked what he is most proud of. He answered, "the idea of Unicode" and explained that, before Unicode, the proliferation of code pages produced a disaster with effects that can still be seen to this day. A truth that we in SG16 know all too well!

Tom.

On 9/12/22 4:45 PM, Tom Honermann via SG16 wrote:

The Unicode Consortium will be hosting an ~2 hour free online event on Wednesday, September 28th, 2022 at 16:30 UTC (timezone conversion).

The topics and speakers include:

  1. An Introduction to Internationalization (i18n) - Addison Phillips, Internationalization Engineer
  2. Overview of the Unicode Consortium: History and Future - Mark Davis, Cofounder and President
  3. Scripts and Character Encoding - Deborah Anderson, Chair of the Script Ad Hoc Committee
  4. The Common Locale Data Repository  (CLDR) - Mark Davis and Annemarie Apple, Chair and Vice Chair of the CLDR Committee
  5. International Components for Unicode (ICU) - Markus Scherer, Chair of ICU Committee
  6. Bringing Internationalization to More Programming Languages and Resource-Constrained Environments (ICU4X) - Shane Carr, Chair of ICU4X Subcommittee

Additional details and registration information are available here.

Tom.