Microsoft in the News
Continuing with the theme of intelligent retail from my last blog, Microsoft announced new capabilities for Teams for Firstline Workers.
There are new mobile Teams features that are customizable to make it easier for front line workers to connect with others in the store, or throughout the organization, and gives them easy access to all the apps and services they need while on the clock.
Teams now includes an API that will allow the retail staff to connect to workforce management systems. With this, Teams will now act as a hub for accessing different systems.
Since feedback is such an important part of keeping your staff happy and engaged, Microsoft has added a new Praise Tool. Managers and employees can now this to improve workplace culture by publicly recognizing a co-worker’s efforts.
More details about how Teams is improving to help retail business, follow the link: https://www.microsoft.com/en-us/microsoft-365/blog/2019/01/09/3-new-ways-microsoft-teams-empowers-firstline-workers-to-achieve-more/
UTF-8
The Unicode Transformation Format UTF-8 was first introduced to the world in 1993 at a conference in San Diego. By 2008 it was the most commonly used format and by 2009 it was the dominant format for the internet. It is currently used in about 92.4% of all web pages. UTF-8 is used by 954 of the top 1000 highest ranked web pages. The International Standards Organization for the World Wide Web (W3C) recommends UTF-8 as the default format in XML and HTML. Finally, the Internet Mail Consortium (IMC) recommends that all email programs be compatible with UTF-8.

UTF-8 was created to be backwards compatible with ASCII (both 7 bit and 8 bit), which allowed for its rapid acceptance. Under UTF-8, all characters are defined in 1 to 4 bytes. The first 128 characters (US-ASCII) require 1 byte. 2 bytes are needed for each of the next 1,920 characters. That covers almost all Latin-script alphabets, Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana and N’Ko alphabets. All the remaining characters included in virtually every other alphabet, including Chinese and Japanese require 3 bytes each. Everything else, including emojis, are represented using 4 byte code points.
Some examples of what a UTF-8 code looks like in binary:

The dollar sign and letter A being ASCII symbols require only one byte to represent them. Both the Euro and Bitcoin symbols use three bytes.
UTF-8 has the advantage of being quick to encode because it uses simple bit operations. Other encoding formats such as Shift, JIS and others use mathematical operations such as division and multiplication which require more power, resources and are several times slower.
Encoding formats that are specific to a script, such as some East Asian encoders require less storage space than UTF-8 because they use 2 bytes per character instead of 3.
In North America it is rare to see the unknown chars symbol:

This symbol shows up when there is a mismatch due to incompatible encoding. In Japan, where the JIS encoder is common, incompatible encoding is so common, they have a word for the unreadable garbage that it creates: 文字化け /modʑibake/
The solution is to standardize and that is what the Unicode Consortium is doing. They have standardized the characters used by computers around the world. But this standardization has been implemented in several ways. Encoders such as UTF-8, UTF-16 and UTF-32 each encapsulate all the necessary characters, but in different ways. UTF-32, for example will use 4 bytes for every character whereas UTF-8 is variable length and can use 1, 2, 3, 4, 5 or 6 bytes, depending on the character. Currently, UTF-8 only uses up to 4 bytes. Only UTF-8 is backwards compatible with ASCII.

Each character in UTF-8 is self-informing; it does not need a byte order to be specified. Text can be read partially from any point in the file and the parser will decode it. This makes it better at recovering from errors.
The advantages of UTF-8, and the reasons why it has become the default encoder, are as follows:
- Success breeds success. In other words, it is universal because everyone else is using it.
- All Unicode characters can be used.
- Microsoft recommends it, despite them having their own Windows encoding.
- It is backwards compatible with ASCII (the initial success has a lot to do with this).
- XML uses it as its standard.
- By using less bytes than UTF-16 or UTF-32, it saves storage space.
It is point 6 in the list that makes the introduction of UTF-8 to SQL 2019 the impetus for this blog series.
In my next blog I will start looking at the adoption of UTF-8 within SQL Server.