What is Unicode?

Microsoft in the News

2019 is starting with a lot of buzz about “intelligent retail”. More and more, companies are taking advantage of AI in the cloud to ensure their customers are getting the right experience. Everything to do with retail is being reimagined.

Retailers are embracing the idea that everyone with a smartphone is a person with a shopping mall in their pocket. With this realization, retailers are tracking their customers like never before and supplementing that information with data from third parties (think Facebook and Google). Armed with this data, they are processing it in the cloud with AI and end ensuring that their customers can get what they want, when they want, delivered where they want – all at a price that maximizes profits and/or sales (depending on how you tweak the AI).

You may think you are immune, but this isn’t a multi-billion dollar industry that doesn’t produce results. We are all affected, and our children don’t stand a chance.

In addition to targeting customers, intelligent retail is also concerned with tailoring retail employees and supporting them so they can ensure better customer experiences. Your employees are still the front line in customer interaction, but only when the customer actually shows up. Sometimes, intelligent retail technology is used to organize employees to ensure on-line orders are fulfilled as quickly and accurately as possible. Kroger’s recently partnered with Microsoft and managed to reduce the time taken to fulfill an online order by 50%. This is the sort of results that are coming out of the push for intelligent retail.

If your company sells something, or if you purchase things from time to time, intelligent retail is going to have a profound affect on your life.

What is Unicode?

When you boil it down, a computer deals exclusively with numbers. In particular, zeros and ones. To store the letter “Z”, a computer assigns that letter a number that represents the letter “Z”.

Before Unicode, there were literally hundreds of competing systems (encodings) that assigned numbers to letters. Computers needed to support all or many of these in order to operate and allow us to communicate. In many instances, multiple encoding systems used the same number, but assign a different character to it. In other instances, different numbers were used to represent the same character. The risk of data corruption was high when data was translated to a different encoding or passed to different computers.

Unicode standardized the system of encoding characters and has become the industry standard. Unicode Version 11.0 was released in June, 2018. It contains 137,439 characters representing 146 modern and historic scripts and includes many symbol sets and even emojis.

In addition to the characters, the rules for using them are also contained in Unicode. These include rules for character properties and display order (i.e. Hebrew characters are read right to left).

Unicode’s success can be measured by its adoption rate. Currently, all modern operating systems use Unicode, as does XML, Java and the .NET Framework.

Within Unicode, just to keep things confusing, there are several formats known as Unicode Transformation Formats (UTF). These formats include:

  • UTF-1, a retired predecessor of UTF-8, maximizes compatibility with ISO 2022, no longer part of The Unicode Standard;
  • UTF-7, a 7-bit encoding sometimes used in e-mail, often considered obsolete (not part of The Unicode Standard, but only documented as an informational RFC, i.e., not on the Internet Standards Track either);
  • UTF-8, an 8-bit variable-width encoding which maximizes compatibility with ASCII;
  • UTF-EBCDIC, an 8-bit variable-width encoding similar to UTF-8, but designed for compatibility with EBCDIC (not part of The Unicode Standard);
  • UTF-16, a 16-bit, variable-width encoding;
  • UTF-32, a 32-bit, fixed-width encoding.

In my next blog I will dive a little deeper into UTF-8

Leave a Reply

Your email address will not be published. Required fields are marked *